REMARKS 

A. Status of the Claims 

Claims 39-50 were pending at the time of the Action. Claims 40, 41, 43, and 48 have 
been amended. Claim 39 has been canceled. Claims 40 and 41 have been amended to depend 
from claim 48. Claim 40 was also amended to correct the spelling of "dipyridodiazepinona" (see 
e.g., Specification, p. 30, In. 7). Claim 43 has been amended to maintain the proper antecedent 
basis with claim 48. Support for the amendment to claim 48 can be found in the specification at, 
for example, page 10, line 24 to page 11, line 5; and page 13, lines 4-9. No new matter has been 
added by these amendments. 

B. The Information Disclosure Statement 

The Action indicates that the information disclosure statement filed on November 26, 
2003 did not include copies of the listed publications. Applicant notes, however, that copies of 
the listed publications are not required according to 37 C.F.R. § 1.98(d). As stated in the 
information disclosure statement, the present application is a divisional application of Serial No. 
09/638,833, filed August 14, 2000. Thus, in accordance with Rule 1.98(d) copies of the listed 
references were not required as they have been previously submitted to the Patent and Trademark 
Office in prior application Serial No. 09/638,833. 

Applicant notes that if the references have been lost from the Patent and Trademark 
Office's files, Applicant would gladly have provided another copy of the references had 
Applicant been notified of the loss. Applicant was not, however, notified by the Patent and 
Trademark Office that the references were lost. Applicant further notes that it would be 
improper for any new rejection based on a reference cited in the information disclosure statement 
to be made final, because the reference could have been obtained from the Patent and Trademark 
Office's files and considered prior to the mailing of the present Office Action. 
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C. The Objection Under 37 C.F.R. § 1.75(c) 

The Action objects to claims 39-47 under 37 C.F.R. § 1.75(c) as being of improper 
dependent form because they refer back to a numerically following claim. The MPEP states that 
"in situations where a claim refers to a numerically following claim and the dependency is clear, 
both as presented and as it will be renumbered at issue, all claims should be examined on the 
merits and no objection as to form need be made. In such cases, the examiner will renumber 
the claims into proper order at the time the application is allowed." MPEP § 608.0 l(n)(F) 
(emphasis added). Although claims 39-47 depend, either directly or indirectly, from claim 48, 
their dependency is clear. Applicants, therefore, respectfully request that the objection be 
withdrawn and that the claims be renumbered at issue. 

D. The Claims Are Definite 

The Action rejects claims 39-50 under 35 U.S.C. § 112, second paragraph, as being 
indefinite. The Action raises three issues in this regard. First, the Action states that it is unclear 
whether the RT being inactivated is a purified RT or an RT in a viral particle. Second, the 
Action states that the preamble of claim 48 fails to set forth an objective. Finally, the Action 
states that the claims are incomplete for omitting essential positive steps. 

The current claims clarify that the RT being inactivated is in a viral particle. In addition, 
the preamble of current claim 48 specifies that it is a method "of eliciting an immune response." 
Current claim 48 also includes in the last step the positive recitation "wherein an immune 
response is elicited in the subject," which refers back to the objective of the method recited in the 
preamble. In view of the above, the current claims are definite. Applicants, therefore, request 
the withdrawal of this rejection. 
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E. The Claims Are Supported by Adequate Written Description in the Specification 

The Action rejects claims 39-50 under 35 U.S.C. § 112, first paragraph, as containing 
subject matter not reasonably described in the specification. In particular, the Action asserts that 
the claims encompass large numbers of RT proteins and inactivating compounds, however, the 
specification only provides working examples with HIV RT and a limited number of 
photolabeling compounds. Applicants traverse this rejection. 

To satisfy the written description requirement, a patent specification must describe the 
claimed invention in sufficient detail that one skilled in the art can reasonably conclude that the 
inventors had possession of the claimed invention. As set forth in detail below, one skilled in the 
art would reasonably conclude that the inventor had possession of currently claimed invention 
based on the description provided for in the specification. 

HIV viruses are members of the Retroviridae family. A unique aspect of retrovirus 
replication is the conversion of a single-stranded RNA from the virus genome into a double- 
stranded DNA molecule that must integrate into the genome of the host cell prior to the synthesis 
of viral proteins and nucleic acids (Specification, p. 3, In. 4-12). Accordingly, all retroviruses 
possess a reverse transcriptase enzyme, which converts the RNA of their genetic material into 
DNA. Furthermore, since all reverse transcriptases prime the synthesis of new DNA from 
tRNA, which is a molecule with abundant secondary structure strongly associated with the 
enzyme, it is generally accepted that the catalytic unit among reverse transcriptases is 
phylogenetically conserved (see e.g., Flavell, Retroelements, reverse transcriptase and evolution, 
Comp. Biochem. Physiol, vol 110B,N01 pp3-15, 1995; Xiong and Eickbush, Origin and 
evolution of retroelements based upon their reverse transcriptase sequences, The EMBO Journal 
vol.9 nolO pp3353-3362, 1990; Boeke, The unusual phylogenetic distruibution of 
retrotransposons: A hypothesis, Genome Res. 2003 13:1975-1983; Nakamura et al, Telomerase 
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catalytic subunit homologs from fission yeast and human, Science vol.277,August 15 1997; 
Springer et al, Phylogenetic relationships of reverse transcriptase and Rnase H sequences and 
aspects of genome structure in the gypsy group of retrotransposons, Mol. Biol. Evol. 1 0(6): 1370- 
1379, 1993; Lingner et al, Reverse trasncriptase motifs in the catalytic subunit of telomerase, 
Science 276:561( 1997); Valverde-Garduno et al, Functional analysis of HIV-1 reverse 
transcriptase motif C: site directed-mutagenesis and metal cation interaction, J. Mol. Evol. 1998 
Jul; 47(l):73-80; Seifarth et al, Rapid identification of all known retroviral reverse transcriptase 
sequences with a novel versatile detection assay, AIDS Research and Human Retroviruses, vol. 
16 Number 8, pp 721-729, 2000), 

Since retroviruses cannot integrate into the genetic machinery of the host cell without 
reverse transcription, the inhibition of reverse transcriptase will have as a universal consequence 
the inability of any retrovirus to integrate within the genetic machinery of a suitable host cell. 
Thus, regardless of the type of retrovirus, the inactivation of reverse transcriptase as described in 
the present specification would be understood by a person of ordinary skill in the art to be 
applicable to any retrovirus. Furthermore, the present specification specifically states that "the 
methodology of the present invention is applicable to any retrovirus which may be associated 
with any animal or human disease as a method for development of effective immunogens and 
preventative vaccines. Thus, the present invention has a broader applicability than the 
exemplified HIV vaccine." (Specification, pg. 16, In. 20-23). 

As described in the present specification, reverse transcriptase may be inactivated by 
binding the reverse transcriptase with one or more azido-labeled compounds and then irradiating 
it {see e.g., p. 12, In. 8-9). The written description requirement for a claimed genus may be 
satisfied through sufficient description of a representative number of species. Furthermore, it 
is not necessary that every permutation within a generally operable invention be effective for 
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Applicant to obtain a generic claim, provided that the effect is sufficiently demonstrated to 
characterize a generic invention. Capon v. Eshhar, 418 F.3d 1349, 1359 (Fed. Cir. 2005). 
Numerous compounds that bind to reverse transcriptase are known in the art (see e.g., 
Specification, p. 3, In. 23, to p. 4, In. 11). Many of these compounds are nucleoside analogs, 
such as AZT, which inhibit the reverse transcriptase by competing with the naturally occurring 
deoxynucleotides needed to synthesize the viral DNA. There are also non-nucleoside inhibitors 
(NNIs) of reverse transcriptase. Examples of NNIs provided in the present specification include 
nevirapine and its analogues, the pyridinones, the pyridobenzo- and dipyridodiazepinones, the 
quinoxalines, and the carboxanilides (Specification, p. 3, In. 27, to p. 4, In. 1; p. 21, In. 21-26). 
Additional examples of compounds that bind reverse transcriptase include UC781, 9-AN, UC38, 
UC84, UC10, UC82, UC040, HBY 097, calanolide A, and U-88204E (see e.g., Specification, pg. 
12, In. 14-18). 

As described in the present specification, the above-mentioned compounds may be 
converted to azido photoaffinity labels and utilized for the inactivation of reverse transcriptase 
(Specification, p. 21, In. 21-28). Thus, for example, azido-labeled compounds, such as azido- 
UC781, and the azido derivatives of 9-AN, UC38, UC84, UC10, UC82, UC040, HBY 097, 
calanolide A, and U-88204E, may be used to inactivate reverse transcriptase according to the 
methods described in the present specification. Thus, the present specification satisfies the 
written description requirement by describing a representative number of azido-labeled 
compounds that bind reverse transcriptase. 

In addition, examples of identifying characteristics of biomolecules listed in the Action 
(see p. 5) include chemical structure and binding affinity. The azido-labeled compounds have, at 
least, the common structural feature of the azido group and the ability to bind reverse 
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transcriptase. Thus, the present specification also satisfies the written description requirement by 
describing the relevant identifying characteristics of the azido-labeled compounds. 

In view of the above, the present specification describes the claimed invention in 
sufficient detail that one of ordinary skill in the art can reasonably conclude that Applicant had 
possession of the claimed invention at the time of filing. 

F. Conclusion 

Applicant believes that this is a full and complete response to the Office Action mailed 
October 5, 2006. Should the Examiner have any questions, comments, or suggestions relating to 
this case, the Examiner is invited to contact the undersigned Applicants' representative at (512) 
536-5654. 

Respectfully submitted, 




Travis M. Wohlers 
Reg. No. 57,423 

FULBRIGHT & JAWORSKI L.L.P. Attorney for Applicants 

600 Congress Avenue, Suite 2400 

Austin, Texas 78701 

(512) 536-5654 

(512) 536-4598 (facsimile) 

Date: March 5, 2007 
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INVITED REVIEW 

Retroelements, reverse transcriptase and evolution 

Andrew J. Flavell 

Department of Biochemistry, Medical Sciences Institute, The University, Dundee DDI 4HN, 



Retroelements are genetic elements that can exist as DNA or RNA or DNA/RNA duplexes. 
Although retroviruses are the best known retroelements, there are many other types, including 
close relatives of retroviruses like LTR retrotransposons, more distant relatives like non-LTR 
retrotransposons, caulimoviruses and hepadnaviruses and elements with virtually no 
similarity, like retrons. Virtually all retroelements are 'selfish DNAs' with no involvement 
with the normal development or maintenance of their host cells, the only known exception 
being telomereres/telomerases which maintain the ends of chromosomes. Virtually all 
retroelements use tRNA, or RNA with strong secondary structure, to initiate their reverse 
transcription. The coincidence between the use of tRNA, a molecule central to the conversion 
of RNA to protein, with reverse transcriptase, an enzyme which is crucial for the conversion 
of RNA to DNA is striking, because RNA probably preceded DNA and protein in evolution. 
It seems plausible that retroelements were present at the genesis of living systems. 

Key words: Retrovirus; Reverse transcriptase; Retrotransposon; Retron; Retroelement; 
Evolution; Telomerase; Caulimovirus; Hepadnavirus; Pararetrovirus. 
Comp. Biochem. Physiol. HOB, 3-15, 1995. 



Introduction 

This review is intended to give the reader an 
overview of the many apparently diverse 
manifestations of genetic elements contain- 
ing reverse transcriptase. What emerges 
from a closer look at these elements is the 
striking similarity between many of them, 
suggesting that many, and perhaps all these 
elements share a common evolutionary 
origin. 
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The Origin of Reverse Transcriptase 

RNA has an intrinsic catalytic ability to 
make and break its own phosphodiester 
backbone. We, therefore, believe that RNA 
was probably the first self-replicating entity 
and evolution first worked on it before 
DNA and protein were brought into the 
picture (Cech and Bass, 1986; Darnell and 
Doolittle, 1986). 

Although RNA was probably the first 
genetic material, it is poorly suited to that 
role because it is chemically labile. DNA is 
much more inert and better suited to carry- 
ing genetic information between gener- 
ations. RNA can be converted to DNA by 
reverse transcriptase, an enzyme which is 
related in sequence to RNA replicases 
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Table \ Types of retro element 
Telomeres/telomerase 
Group II introns 
Retrons 

Fungal mitochondrial plasmids 
Non-LTR retrotransposons 
LTR retrotransposons 
Retroviruses 

Caulimoviruses and hepadnaviruses 



which exist in RNA viruses (Xiong and 
Eickbush, 1990). We believe that reverse 
transcriptase evolved from an RNA repli- 
case and this event was central to the 
development of DNA as the main form of 
genetic material in living organisms. 

Once reverse transcriptase had accom- 
plished this feat, it largely bowed out of the 
main story of evolution, leaving DNA as 
the genetic store and RNA as either the 
message for production of proteins (mes- 
senger RNA) or part of the machinery of 
RNA splicing, polyadenylation, etc, cata- 
lysed by ribonucleoprotein complexes, and 
translation, catalysed by transfer RNAs 
and ribosomal RNAs. But it did not disap- 
pear entirely and has been found still in a 
wide variety of guises. These 'retroelements' 
are listed in Table 1 and I will first review 
them in ascending order of sophistication, 
concentrating mainly on what we know of 
their replication cycles, before discussing 
the evolutionary implications of these data. 

Telomeres and Telomerases 

Telomeres are the extreme ends of linear 
chromosomes. Linear chromosomes are 
ubiquitous (as far as we know) in nuclear 
eukaryotes genomes and they face the same 
dilemma found by all linear DNAs, namely, 
how to resist agents which lead to shorten- 
ing, such as attack by DNA exonucleases or 
the inherent inability of DNA polymerase 
to copy its template to its extreme 5' end. 
The basic solution to the problem for the 
majority of eukaryotes is to locate a simple 
sequence at the telomere which counteracts 
the reduction in size by replicating extra 
copies of itself (Blackburn, 1992; Schippen, 
1993). The exact sequences of the repeats 
within telomeres are species-specific, but a 
typical example is that of Tetrahymena 
thermophila, in which the sequence 
GGGGGTT is repeated many times. 



The template for these extra copies is an 
RNA molecule which is bound tightly to a 
specialised type of reverse transcriptase 
called telomerase. In this case, the template 
RNA carries a homologous sequence 
(5'AACCCCAA3') which is used to gener- 
ate a new DNA strand (Fig. 1). 

There is still very little known about the 
telomerase enzyme which carries out this 
function, because it is difficult to work with. 
An exciting possibility is that the RNA 
template itself is directly involved in the 
catalytic process, making it another RNA 
enzyme (ribozyme). However, this hypoth- 
esis remains untested and no ribozyme to 
date has been shown to catalyse reactions 
on DNA. Perhaps RNA never learned this 
feat and DNA has only ever been 'geneti- 
cally manipulated' by proteins. 

Group II Introns 

Two revolutionarily ancient types of in- 
tron survive in the organelles or plasmids of 
lower eukaryotes. These group I and II 
introns have the ability to splice themselves 
out of their precursor mRNAs, without the 
help of proteins (Cech and Bass, 1986). The 
two groups are classified by their character- 
istic sequence motifs which themselves 
define group-specific conserved secondary 
structures in the intron RNA. Several of 
these introns encode 'maturase' genes which 
aid the splicing process in the cell, although 
the polypeptides encoded by these genes are 
not essential for self-splicing in the test tube 
(Carignani et a!., 1983). A variety of 
polypeptides are used, these are derivatives 
of enzymes concerned with RNA metab- 
olism (Lambowitz and Perlman, 1990). 

Polypeptides resembling aminoacyl 
tRNA synthetases are common and reverse 
transcriptase-like proteins are also found. 
These reverse transcriptase-like proteins 
have not yet been shown to have enzymic 
activity but another strange property of 
group II introns suggests that this may be 
the case. Some group II introns occasion- 
ally transpose to new chromosomal lo- 
cations (Mueller et ai, 1993; Sellem et al., 
1993). These new locations are sometimes 
non-homologous to pre-existing insertion 
sites for the introns, suggesting that this is 
true transposition and not a phenomenon 
related to homologous recombination. 
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While other models are possible, it is feas- 
ible that the transposition occurs in the 
following way. First, an RNA copy of the 
intron becomes inserted into another RNA 
by a reversal of the normal splicing mech- 
anism (such reverse-splicing has already 
been demonstrated in group II introns; 
Augustin et al, 1990). Then the novel in- 
tron-containing RNA is reverse transcribed 
into DNA which then either recombines 
into the chromosomal DNA directly or by 
gene conversion. 



Retrons 

The bizarre RNA-DNA chimaera shown 
in Fig. 2 is found in many bacteria (Inouye 
and Inouye, 1993). This small extrachromo- 
somal molecule, called msDNA, is syn- 
thesized in bacterial strains containing a 
genetic element called a retron (Fig. 3). A 
retron minimally consists of the DNA tem- 
plate for msDNA synthesis, plus a reverse 
transcriptase gene. A single promoter 
transcribes the entire genetic element, 




ft" 




Fig 4. The Maunceville mitochondrial plasmid of Neurospora. (a) Secondary structure of the 
RNA transcript. RNA is shown by thin lines and DNA by thick lines. Bases conserved with 
tRNAs are shown by small black circles, (b) In vitro initiation of reverse transcription bv 
elongation primed from the 3' end of the RNA. (c) In vitro initiation of reverse transcription 
by a non-specific DNA oligonucleotide (shown by a striped line), (d) In vitro initiation of 
reverse transcription by elongation primed by G mononucleoside or mononucleotide 
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Non LTR retrotransposon 
Tyl-copia group retrotransposon 

r~ ^ II I M rt RNaseH int e »v(V LTR 

Retrovirus and gypsy group retrotransposon 

Fig. 5. Gene structural features of retrotransposons and retroviruses. See text for functions 
of genes. 



producing a long transcript from which 
reverse transcriptase is synthesized. The 
transcript can adopt a secondary structure, 
because it contains several regions of self- 
complementarity. This folded RNA then 
acts as a template for reverse transcription 
by the retron-encoded enzyme. This syn- 
thesis seems to be primed from a 2' OH 
group (starred in Fig. 3), unlike all other 
known reverse transcriptases, RNA poly- 
merases and DNA polymerases, which 
prime from 3' OH groups. The reverse 
transcription becomes stalled at one of 
the RNA hairpin loops. Degradation of 
most of the RNA part of the resulting 
heteroduplex leads to the mature msDNA 
molecule. 

What is the point of this strange phenom- 
enon? Only a proportion of the members of 
a retron-containing bacterial species actu- 
ally contain retrons. We are therefore confi- 
dent that this genetic element is a kind of 
'selfish DNA' (Doolittle and Sapienza, 
1980; Orgel and Crick, 1980; Sapienza and 
Doolittle, 1981) which is non-essential for 
the host and whose prime function is its 
own propagation. It seems unlikely that 
msDNA is an intermediate in an extrachro- 
mosomal replication cycle of retrons, be- 
cause large parts of the complete retron are 
missing from it (including the reverse tran- 
scriptase gene) but it may be the abortive 
descendant of such an intermediate. We 
shall see below that most retroelements use 
reverse transcription to replicate themselves 
via extrachromosomal intermediates. 



Fungal Mitochondrial Plasmids 

Certain fungi (notably some Neurospora 
species) sometimes contain small circular 
double stranded plasmids in their mito- 
chondria. The best characterized of these 
(the Mauriceville and Varkud plasmids) 
have been shown to contain reverse tran- 
scriptase genes and to be replicated by a 
transcription-reverse transcription cycle 
(Fig. 4). Transcription of the plasmid yields 
an RNA of exactly the same size as the 
plasmid, with an intriguing secondary 
structure at the 3' end which is reminiscent 
of tRNAs and the 3' ends of RNA viruses 
of plants and bacteria; indeed, several bases 
conserved among tRNAs (shown by small 
black circles in the figure) are found in the 
plasmid RNA (Fig. 4a). In vitro studies 
using the purified mitochondrial enzyme 
has shown that reverse transcription can be 
initiated in three different ways (Wang and 
Lambowitz, 1993). The first involves 
elongation from the 3' end of the RNA, 
perhaps via the secondary structure shown 
in Fig. 4b. The second way is by elongation 
of a short, non-specific DNA oligonucle- 
otide which is bound to the reverse tran- 
scriptase (Fig. 4c). This time DNA synthesis 
begins at the penultimate base of the RNA. 
The final way (Fig. 4d) is identical to the 
second, except that no exogenous DNA 
primer is required, just a G base. dGMP, 
dGDP and dGTP all function in this in vitro 
reaction. It is unclear which, if any of these 
methods of reverse transcription initiation 
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predominates in vivo. At present, we do not 
know how the full DNA-RNA hybrid 
which is synthesized as the first step in the 
reverse transcription process, is eventually 
converted into the circular double stranded 
plasmid. 



Non LTR Retrotransposons 

Retrotransposons are transposable gen- 
etic elements which use reverse transcrip- 
tion during their movement to new 
chromosomal locations. Non-LTR retro- 
transposons are the simplest type of retro- 
transposon. They are found in the genomes 
of the majority of eukaryotes (including 
mammals, where they are called LINE, LI 
or Kpn elements; Hutchinson et al, 1989; 
Martin, 1991). They contain two discernible 
genes, one of which is a reverse transcrip- 
tase/RNAse H gene (rt RNAseH; Fig. 5). A 
single transcript, initiating at the exact 5' 
boundary of the retroelement, encodes all 
the genetic information of the element 
(Mizrokhi et al., 1988). The details of the 
transposition cycle are still unclear but a 
particle, comprised of proteins encoded by 
the other gene in the element (marked by ? 
in the figure) and containing the retrotrans- 
poson's RNA and reverse transcriptase, has 
been identified and is perhaps involved in 
the reverse transcription and integration 
process, just as the analogous protein- 
aceous particles are implicated in the trans- 
position cycles of LTR retrotransposons 
and retroviruses (see below; Deragon et al., 
1990; Martin and Branciforte, 1993). The 
exact reverse transcription mechanism for 
non-LTR retrotransposons is still unclear, 
but is believed that the reverse transcript 
becomes inserted at random nicks in the 
chromosome (Finnegan, 1989). 

LTR Retrotransposons and Retro- 
viruses 

These two types of retroelement are so 
similar that I will consider them together. 
The gene structures of retrotransposons 
and the DNA forms of retroviruses (called 
proviruses) are shown in Fig. 5. There are 
two main groups of LTR retrotransposons, 
the Tyl-copia group and gypsy group, 
named after prototype elements in yeast 



and Drosophila, respectively (Bingham and 
Zachar, 1989). The former is structurally 
the more simple group. Both groups are 
found in fungi, plants and insects. There 
has been some debate about the existence of 
retrotransposons in vertebrates. Early 
claims were probably no more than defec- 
tive retroviruses (Hodgson et al., 1990) but 
recent comprehensive PCR-based searches 
have identified Tyl-copia group retrotrans- 
posons in fish, amphibia and reptiles 
(though not mammals and birds as yet; 
Flavell and Smith, 1992; Flavell et al., 
1994). 

LTR retrotransposons are known to use 
intracellular ribonucleoprotein particles as 
intermediates in their transposition cycles 
(Shiba and Saigo, 1983; Garfinkel et al., 
1985). The protein structural components 
of the particles are encoded by the gag 
genes (see Fig. 5). The particles also contain 
virtually full length transcripts of the trans- 
posons and several enzymes, all of which 
are retrotransposon-encoded. A protease 
(encoded by the pr gene) is involved in 
cleaving the precursor polyprotein into the 
mature proteins, a reverse transcriptase first 
copies the transposon RNA to form an 
RNA-DNA duplex. For all LTR retro- 
transposons and retroviruses, initiation of 
the reverse transcription is from a 3' end of 
tRNA physically bound to the template and 
the enzyme (Bingham and Xachar, 1989; 
Varmus and Brown, 1989). The next step is 
the degradadion of the RNA in the duplex 
by ribonuclease H to enable synthesis of the 
second DNA strand by the reverse tran- 
scriptase. Finally, an integrase (encoded by 
the int gene) catalyses the insertion of the 
double-stranded DNA copy into the 
chromosome. 

Retroviruses use just the same genes to 
achieve the same result as LTR retrotrans- 
posons (Varmus and Brown, 1989). The 
only significant functional difference be- 
tween the two is the retroviruses proven 
ability to leave the cell as a virus particle. 
Entry of virus particles into a new cell 
requires an envelope glycoprotein, encoded 
by the env gene (Fig. 5), which is embedded 
in the plasma membrane envelope acquired 
by the virus when it buds out through the 
cell membrane. In fact, gypsy group retro- 
transposons are believed by some to actu- 
ally be retroviruses. They possess an extra 
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gene of unknown function in the same 
location as env and there is some evidence 
to suggest that the gypsy transposon of 
Drosophila can form an infections particle 
(Kim et ai, 1994). 

Hepadnaviruses and Caulimoviruses 
(Pararetroviruses) 

Two other virus groups use reverse tran- 
scription in their life cycles, though neither 
normally integrate their DNAs into the 
host chromosomes. Hepadnaviruses (hepa- 
titis B-like viruses) infect vertebrates. They 
contain a small circular DNA molecule 
which is partially single-stranded (Fig. 6; 
Tiollais et al., 1981). This DNA encodes 
genes specifying a reverse transcriptase, 
capsid structural components and viral sur- 
face proteins. Upon infection, the virion 
DNA is filled in by the encapsidated reverse 
transcriptase to form a closed circular 
double stranded DNA. Transcription of 
this template yields an RNA which is trans- 
lated into the virus-encoded proteins (Sum- 
mers and Mason, 1982). Initiation of 
reverse transcription is primed by a protein 
bound to the virion RNA, unlike all other 
retroelements which use RNA primers 
(Gerlich and Robinson, 1980). Reverse 
transcription commences in the particle but 
remains incomplete, forming the partially 
single-stranded virion DNA. 

Caulimoviruses are plant viruses which 
share with hepadnaviruses the properties of 
encapsidation of an incompletely reverse 
transcribed DNA (Bonneville et ai, 1988). 
In this case, the capsid nucleic acid is largely 
double-stranded with a few nicks (Fig. 6). 
The basic steps of transcription from an 
extrachromosomal closed circular DNA 
template into RNA which is translated into 
reverse transcriptase and virus particle 
components are shared with hepadaviruses. 
Priming of caulimovirus reverse transcrip- 
tion uses a tRNA, just as LTR retrotrans- 
posons and retroviruses do. 



The Evolution of Retroelements 

From the above survey, it is evident that 
there is a wide variety of genetic elements 
which use reverse transcriptase for their 
propagation in a broad spectrum of living 



organisms from bacteria to man. I said at 
the outset that we believe the process of 
reverse transcription to be evolutionarily 
ancient. Can we assemble all the known 
manifestations of reverse transcription and 
the genetic elements involved with this pro- 
cess into an evolutionary tree? In some 
cases, this is quite easy (Fig. 7). Retro- 
viruses and LTR retrotransposons are obvi- 
ously related and the more complex gene 
structure of the former, plus phylogenetic 
comparisons of the DNA sequences of these 
elements (Temin, 1980; Xiong and Eick- 
bush, 1990) suggests strongly that LTR 
retrotransposons were the ancestors of 
retroviruses (Fig. 7). Judged by sequence 
homology and structural similarity, the 
most likely immediate progeniter of retro- 
viruses was a gypsy group LTR retrotrans- 
poson. Tyl-copia group retrotransposons, 
with their simpler gene structure, probably 
arose before the gypsy group, though 
whether they were the direct ancestor is 
unclear. 

The hepadnaviruses and caulimoviruses 
are more difficult to fit into this picture. 
Phylogenetic analysis suggests that 
caulimoviruses evolved from gypsy group 
LTR retrotransposons (Doolittle et al., 
1989; Xiong and Eickbush, 1990), but hep- 
adnaviruses are highly diverged from both 
groups. Temin has proposed that both 
viruses evolved from retroviruses by loss of 
the ability of the extrachromosomal DNA 
to integrate into the host chromosome 
(Temin, 1989). Xiong and Eickbush suggest 
that hepadnaviruses arose from a recombi- 
nation event between a pre-existing RNA 
virus and a primitive retrotransposon while 
caulimoviruses derived in a similar manner 
from gypsy group retrotransposons. Both 
models are plausible but the latter is more 
likely, at least in the case of hepadnaviruses, 
which have a priming mechanism for the 
initiation of replication which differs from 
retroviruses and resembles some RNA 
viruses, such as poliovirus. 

What about the more primitive retroele- 
ments? Non-LTR retrotransposons may be 
the ancestors of LTR retrotransposons, be- 
cause of their simpler construction, less 
sophisticated transposition mechanism and 
ubiquity in the eukaryotes (Doolittle et ai, 
1989; Xiong and Eickbush, 1990). The evol- 
utionary status of the other elements 
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mentioned here is still very unclear 
(Doolittle et ai, 1989; Xiong and Eickbush, 
1990). Sequence comparisons between the 
reverse transcriptases argue that retrons 
and the fungal mitochondrial plasmids are 
grouped together, with group II introns 
being the nearest relatives to these se- 
quences and non-LTR retrotransposons the 
next nearest. Additionally, some non-LTR 
retrotransposons transpose to the telomeric 
regions of Drosophila chromosomes 
(Biessman el ai, 1992; Levis et al., 1994) 
suggesting an evolutionary link with 
telomerases. All this implies that the best 
candidate for the progenitor of all these 
elements belongs to an ad hoc group 
containing non-LTR retrotransposons, 
retrons, telomerases, fungal mitochondrial 
plasmids and group II introns, though it is 
impossible to say which, if any, came first. 

Two properties unite virtually all 
retroelements, suggesting that they are fun- 
damental to these elements and were pre- 
sent at their genesis. Firstly, virtually all 
reverse transcriptases prime the synthesis of 
new DNA from an RNA molecule with 
abundant secondary structure which is 
strongly associated with the enzyme. In the 
large majority of cases this is a tRNA. 
tRNAs themselves play a central role in 
living systems as the key component in the 
conversion of RNA to protein. RNA was 
probably the original genetic material and 
transfer RNAs and reverse transcriptase are 
the fundamental components of the ma- 
chinery needed to synthesize DNA and 
protein, respectively, from RNA. The close 
association between the two in most 
retroelements to this day is striking and 
seems to this author a potent argument for 
this model of early evolution. 

The second property uniting retroele- 
ments is their lack of any obvious advan- 
tage to their cellular hosts. With the single 
exception of telomerases, all reverse tran- 
scriptases are involved in the propagation 
of genetic elements which are not involved 
in the day-to-day functioning of cells. In 
fact, some retroelements are dangerous 
parasites (retroviruses, caulimoviruses and 
hepna viruses). Thus, even though their 
origin probably lies at the dawn of cellular 
life, these elements have remained aloof 
from the business of enabling a cell to 
survive and replicate in an environment. 
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The Unusual Phylogenetic Distribution of 
Retrotransposons: A Hypothesis 

Jef D. Boeke 

Department of Molecular Biology and Genetics, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, USA 

Retrotransposons have proliferated extensively in eukaryotic lineages; the genomes of many animals and plants 
comprise 50% or more retrotransposon sequences by weight. There are several persuasive arguments that the 
enzymatic lynchpin of retrotransposon replication, reverse transcriptase (RT), is an ancient enzyme. Moreover, the 
direct progenitors of retrotransposons are thought to be mobile self-splicing introns that actively propagate 
themselves via reverse transcription, the group II introns, also known as retrointrons. Retrointrons are represented in 
modern genomes in very modest numbers, and thus far, only in certain eubacterial and organellar genomes 
Archaeal genomes are nearly devoid of RT in any form. In this study, I propose a model to explain this unusual 
distribution, and rationalize it with the proposed ancient origin of the RT gene. A cap and tail hypothesis is 
proposed. By this hypothesis, the specialized terminal structures of eukaryotic mRNA provide the ideal molecular 
environment for the lengthening, evolution, and subsequent massive expansion of highly mobile retrotransposons, 
leading directly to the retrotransposon-cluttered structure that typifies modern metazoan genomes and the eventual 
emergence of retroviruses. 



The Ancient Origin of Reverse Transcriptase 

There are two arguments for an ancient origin of RT. The first is 
theoretical and is based on the now widely accepted proposal 
that an RNA world preceded the form of biology with which we 
are familiar, the DNA world. Darnell first articulated that RT must 
have been present during the time of the transition between 
these two worlds, and therefore, must be considered ancient 
(Darnell and Doolittle 1986; Fig. 1). The second argument is 
based on the fact that RT genes are very broadly distributed 
among the branches of the tree of life, and have largely (but not 
entirely) descended vertically by descent from an ancestral RT 
gene (Doolittle et al. 1989; Xiong and Eickbush 1990; Eickbush 
1994; Malik et al. 1999). Furthermore, the RT gene has seemingly 
reinvented itself in multiple and diverse forms (Boeke and Stoye 
1997). In addition to the familiar retroviruses, there are pararet- 
roviruses, which package DNA, but replicate by reverse transcrip- 
tion, two major classes of retrotransposons (described in the fol- 
lowing section), as well as a more bizarre group of elements 
found in bacteria and organellar genomes, and hence, referred to 
as the prokaryotic group. The discovery of RTs in bacteria, first in 
the form of msDNA (short for multicopy single-strand DNA) or 
retron elements (Yamanaka et al. 2002) and later in the form of 
retrointrons (Belfort et al. 2002), provided dramatic evidence in 
favor of an ancient origin for RT. 

The highly diverse tree of retroelements can be rooted in the 
prokaryotic group of elements (Eickbush 1994). The prokaryotic 
group includes three types, that is, retrons, retroplasmids, and 
retrointrons. Retrons are RT genes that produce an unusual 
branched structure called msDNA made by reverse transcription 
of a precursor RNA primed from an internal guanosine residue— 
unlike other retroelements, they have no known function or abil- 
ity to mobilize autonomously (Yamanaka et al. 2002). Thus far, 
they have been found only in a very limited subset of bacteria. 
Retroplasmids arc known only from the mitochondria of certain 
fungi replicate by reverse transcription, and exist in both circular 
and linear (hairpin) forms (Kuiper and Lambowitz 1988; Walther 
and Kennell 1999). The retrointrons, or group II introns, mobi- 
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lize or retrohome to empty target sites (unspliced versions of 
their host genes) via a very unconventional mechanism. The ex- 
cised intron lariats insert into double-stranded target DNA (cop- 
ies of the DNA containing the flanking exons but lacking the 
intron) by reversal of the normal splicing reaction, probably 
aided by the maturase activity of the RT proteins encoded by 
these elements. They are then converted into DNA by use of a 
target-primed reverse transcription (TPRT) mechanism similar to 
that used by non-LTR retrotransposons (Zimmerly et al. 199Sa,b; 
Yang et al. 1996). Priming is facilitated by the action of a small 
endonuclease domain of the RT that cleaves the intact strand of 
the double-stranded target DNA. 

Several independent arguments strongly suggest that the 
prokaryotic group of RT sequences is ancestral to the RT se- 
quences of retrotransposons and retroviruses. Counterarguments 
to each of these proposals exist, but as a group, these proposals 
are compelling. (1) It is a simple evolutionary paradigm that 
things evolve progressively from a simple state to an ever more 
complex one. Retrons, retroplasmids, and retrointrons all encode 
a single RT protein, often with only that enzymatic activity, 
whereas retrotransposons and retroviruses always encode mul- 
tiple enzyme activities and usually encode multiple separate pro- 
teins. These additional activities, which include proteases, zinc 
finger domains, at least three distinct types of endonudeases, 
and integrase, appear to have been recruited from eukaryotic 
host genomes at multiple times in evolution, probably using the 
same types of mechanisms used by retroviruses when they pick 
up cellular oncogenes (Telesnitsky and Goff 1997). A widely ac- 
cepted extension of this simple argument is that the retroviruses 
and pararetroviruses evolved from l.TR retrotransposons by ac- 
quiring new proteins conferring the ability to efficiently leave 
and re-enter host cells, also known as horizontal transfer or lat- 
eral transfer (Doolittle et al. 1989). (2) The RT of one member of 
the prokaryotic group has the ability to perform primer- 
independent synthesis, similar to RNA polymerase, the presumed 
ancestor of RT (Wang and Lambowitz 1993). (3) The RT se- 
quences of the prokaryotic group are the most similar to the 
sequences of the presumed ancestral outgroup of sequences, the 
RNA-directed RNA polymerases (RdRPs). Non-LTR retrotrans- 
posons, LTR retrotransposons, and retroviral RTs are progres- 
sively less closely related to RdRP sequences (Eickbush 1994). (4) 
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The sequence of telomerase, a specialized RT considered by many 
to represent an ancient eukaryotic enzyme, clusters with prokary- 
otic and non-LTR retrotransposon RT sequences (Eickbush 1997; 
Nakamuia and Cech 1998). 

Two Types of Retrotransposons That Mobilize 
by Distinct Mechanisms 

The retrotransposons can be divided into two major groups, the 
non-LTR and the LTR retrotranposons. The mechanisms of these 
two types of retoelements are summarized briefly here and in 
Figure 2. In addition, two smaller retrotransposon families, the 
D1RS1 (Goodwin and Poulter 2001) and BEL (Malik et al. 2000) 
groups, appear to be distinct, but are much less widely distrib- 
uted, and thus, will not be discussed further here. 

Of the two major retrotransposon classes, the non-LTR ret- 
rotransposons, are less well understood mechanistically, but nev- 
ertheless, a good outline of the process exists (Kazazian and 
Moran 1998). The element mRNAs are translated in the cyto- 
plasm, producing one or two proteins. One of these is a polypro- 
tein with at least two critical activities, an endonuclease, and an 
RT. Most of the non-LTR elements also encode an RNA chaper- 
one, whose role remains unclear. However, the endonuclease/RT 
protein is thought to bind the element RNA to form an RNP 
complex, which then enters the nucleus. This complex then ac- 
quires a host DNA target, in which a nick is made by the endo- 
nuclease. In a remarkable target-primed reverse transcription 
(TPRT) process, the 3' OH of the cleaved target DNA primes re- 
verse transcription of the element RNA, at or near the 3' poly(A) 



end. The mechanism of the cutting of the second strand and 
second-strand synthesis is less well understood, but may well be 
symmetrical with the first, involving a second round of TPRT 
with the newly made DNA strand serving as template. 

LTR retrotransposons move via a mechanism quite similar 
to that used by retroviruses. Generally, two primary protein prod- 
ucts are made, corresponding to retroviral Gag (coat proteins) 
and the readthrough product Gag-Pol (RT and other enzymes). 
The Gag proteins together with two RNA molecules are as- 
sembled into a virus-like particle (VLP). This encapsidation may 
serve to further protect the element's genomic RNA molecules 
from degradation. Reverse transcription occurs in the VLP, and is 
primed by a cellular tRNA (Chapman et al. 1992) or retrotrans- 
poson RNA fragment (Levin 1995). The initial product of the RT 
reaction, minus strand strong-stop DNA, is transferred to the 3' 
end of the RNA in a critical step that leads to subsequent comple- 
tion of the minus strand DNA synthesis. If t n reiati nal 
amounts of RNA were lost exonucleolytically from the 5' or 3' 
end during this process, retrotransposition would fail. Several 
additional steps similar to those used by retroviruses, including a 
second priming event and strand transfer, lead to the final prod- 
uct of reverse transcription, a double-stranded DNA (Boeke and 
Stoye 1997; Telesnitsky and Goff 1997). RNA integrity is impor- 
tant for this process, which can take several hours to complete- 
however, a recombination-like template switching process can 
bypass damage to the element's RNA. The resulting DNA, to- 
gether with the integrase protein (processed previously by an 
element-encoded protease from the RT precursor protein Gag-Pol) 
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Figure 2 Retrotransposition mechanisms. The lifecycles of non-LTR (left) 
black lines are cDNA strands. 



is transported to the nucleus, where it inserts via a transesterifi- 
cation reaction very similar to that used by DNA transposons 
(Mizuuchi and Baker 2002). 

Modern-Day Distribution of RT Genes 

Faced with the assumption that RT is an ancient enzyme, it be- 
comes difficult to explain the modern-day distribution of RT 
genes in the three kingdoms of life, Eubacteria, Archaea, and 
Eukarya. The majority (67%) of sequenced eubacterial species 
lack a detectable RT gene in their genome (Fig. 3). For those 
species of eubacteria that do contain RT genes, they mostly con- 
tain only one or two RT genes. The great majority of Archaea lack 
recognizable RTs altogether; the only exception to this trend, 
Methanosarcina, has a very large genome thought to have been 
formed by the incorporation of a large segment of a eubacterial 
genome as a late lateral transfer event in its evolution (Deppen- 
meier et al. 2002). This species contains a set of retrointrons 
similar to those found in eubacteria (Dai and Zimmerly 2003). In 
contrast, RT genes are found in virtually all eukaryotic genomes, 
and are generally found in 20 to >500,000 copies per genome, 
liven when adjusted for genome size, eukaryotes contain signifi- 
cantly more RT genes. Virtually all of these are non-LTR and/or 
1..TR retrotransposons. In a recent grand synthesis, Bushman po- 
etically described eukaryotic genomes as "genes floating on a sea 
of retrotransposons" (Bushman 2002), although an astute re- 
viewer of this work points out that genes do not float, or else 
gene order colinearity would not lie observed in genomic com- 
parisons. Some well-known extreme examples of this include the 
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LTR retrotransposons are outlined. Wavy lines are RNA molecules; thin 



human genome, -1,000,000 non-LTR retrotransposons, SINES, 
and endogenous retroviruses (Smit 1996) and the maize genome, 
estimated to contain -200,000 copies of intact retrotransposons 
(SanMiguel et al. 1996; J. Bennetzen, pers. comm.). It is the abun- 
dance of retroelements that largely explains the C-value paradox 
in most metazoans. What led to such an abundance of RT genes? 

It could be argued that the observed discrepancy is a simple 
consequence of genome streamlining in bacterial genomes. Al- 
though there is no doubt that streamlining is a major evolution- 
ary force in both eubacteria and Archaea, one can consider as a 
control for the above conclusion the distribution of DNA trans- 
posons among the three kingdoms. DNA transposons are found 
in almost all eubacterial and archaeal genomes and typically are 
found between 1 0 and 100 copies. They are also found in eukary- 
otes, but have a somewhat spottier distribution there, being quite 

II r p in 1 in rl un , ,, Dn ,p! 
Brassica), but notably absent from others [Sticcharoinycei, Sch.izu- 
saccharomyces). 

The dramatic discrepancy in retroelement distribution be- 
tween prokaryotes and eukaryotes strongly suggested to me that 
there was some special feature(s) of being eukaryotic that repre- 
sented a permissive state for RT and allowed the evolution and 
proliferation of retrotransposons. 

The Evolution of Eukaryotes and Their Retroelements 
The release of numerous eubacterial, Archaeal, and eukaryotic 
genome sequences has provided extensive fodder for models of 
how eukaryotes evolved. It is clear that we eukaryotes contain a 
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Figure 3 Bacterial genomes contain very few RT genes. A total of 96 completely sequenced 
bacterial genomes were searched by BLASTp on the comprehensive microbial resource at www 
tigr.org. The two queries used were the LtrA RT from a Lactobacillus locus group II intron (Q57005) 
and a retron RT from Escherichia coli (P23070). The number of BLAST hits with an E value <0 001 
was tabulated for both queries, and the higher number was taken as the measure of RT gene 
number (visual inspection showed that this modestly inflated the number of RT genes as some of 
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mixture of genes descended from Archaeai and eubacterial an- 
cestor cells (Woese et al. 1990; Margulis 1996). The precise se- 
quence of events in the evolution of eukaryotes has been debated 
hotly, but a consensus is developing about the major events that 
must have occurred. This consensus view will be recounted here 

Archaea and eubacteria were two ancient lineages of cells 
that had evolved distinct mechanisms of transcription and DNA 
replication, among other things, but otherwise shared the fun- 
damental properties of being unicellular heterotrophs. Symbiosis 
of eubacterial cells (the progenitor of the mitochondrion) and 
Archaeai cells ultimately led to a proto-eukaryote containing a 
eubacterial endosymbiont. This may have begun as a casual or 
accidental symbiosis, but at some point, provided some impor- 
tant selective advantage. Several other events followed, probably 
involving an additional cycle(s) of acquiring additional genomes 
via consumption (Taylor 1974), as well as the acquisition of a 
number of other distinctive eukaryotic features, which will be 
considered separately in the next section. These events gave rise 
to a primitive eukaryote with the recognizable nuclear genome 
and mitochondrial genome, each in a membrane-bounded com- 
partment. Acquisition of an additional photosynthetic bacterium 
by consumption led to a plant lineage, but for simplicity, this will 
not be considered further here. Because the modern eubacteria 
contain RT genes and the Archaea largely lack them, 1 will make 
the fairly arbitrary assumption that the same was true at the 
dawn of eukaryotes. The eubacterially derived endosymbiont(s) 
slowly transferred its genes to the nucleus of the primitive eu- 
karyotic cell, becoming ever more dependent on its host. Re- 
markably, this process of gene transfer from mitochondria to 
nucleus is functional in modern-day yeast cells, in which the 
transfer of mitochondrial gene segments to the nucleus can be 
observed experimentally (Thorsness et al. 2002). Through this 
process, RT genes present as retrointrons would be transferred 
readily to the nucleus through this passive and stochastic pro- 
cess. Movement of retrointrons via homing to near-cognate sites 
might well have led to a proliferation of introns and the evolu- 
tion of the splicing apparatus as an intron-removal mechanism. 
The stage was set for the evolution of retrotransposons. What 
specific features of eukaryotic cells made this possible? 



The Nuclear Membrane 

The existence of a nuclear membrane 
would appear to be an impediment and not 
a help to the evolution of retrotransposi- 
tion. The translation process occurs outside 
of the nucleus, whereas transposition hap- 
pens inside, and therefore, retrotrans- 
posons have evolved transport mechanisms 
to overcome these barriers. Therefore, the 
existence of the nuclear membrane is in- 
hibitory to successful retrotransposition. 
Linear Chromosomes 

The transition to linear chromosomes from 
the presumed ancestral circular state may 
well have provided an early opportunity for 
an RT gene to make itself indispensable to 
its host by acquiring the ability to lengthen 
telomeres, leading to the enzyme telomer- 
ase, still the major mechanism for telomere 
formation in modern eukaryotes (Naka- 
mura and Cech 1998), providing an elegant 
solution to the end-replication problem 
posed by the termini of linear DNA mol- 
ecules. However, this would provide a 
niche for but a single copy of RT. Also, a 
compelling case can be made that telomerase was a relatively late 
acquisition, by devolution of a retro transposon (Pardue et al. 
1997), as telomeres could, in principle, solve the end-replication 
problem via the formation of T-loops (Griffith et al. 1999). Thus, 
linear chromosomes per se do not provide a compelling oppor- 
tunity for the evolution of retrotransposons. 



As argued above, RT may have played an important role in the 
widespread accumulation of introns in primitive cells, although 
the timing of this event has also been the subject of much debate 
(Gilbert and Glynias 1993; Logsdon Jr. and Palmer 1994; Stoltz- 
fus 1994; Logsdon Jr. 1998; Simpson et al. 2002). However, the 
simple existence of these introns did not confer any special se- 
lective advantage on RT genes. Rather, the proliferation of in- 
trons may well be a consequence of a permissive RNA environ- 
ment that allowed them to mobilize more readily in the genome. 

Sex and Diploid? 

Donal Hickey (Hickey 1993) and Tim Eestor (Bestor 1999) have 
provided eloquent arguments that the evolution of sex and dip- 
loidy provides an opportunity for mobile elements to invade host 
species and march inexorably to fixation in the host genome, 
providing they do not decrease the fitness of their host >S0%! 
However, this argument applies to both retrotransposons and 
DNA transposons, and thus, is insufficient to explain the selec- 
tive amplification of retrotransposons in eukaryotes. 

RNA Processing Machinery 

The physical separation of the processes of transcription and 
translation and changes in gene organization (perhaps the con- 
sequence of the nascent eukaryal nuclear genome being bom- 
barded with fragments of its endosymbiont guest DNA), and 
other factors, led to important changes in the way RNA was me- 
tabolized in eukaryotic cells. The major changes were the com- 
partmentalization of single-coding regions (by and large) into 
stereotypical mRNA structures punctuated by a 5' cap structure 
and 3' poly(A) tail (Fig. 4). Not only are the structures of these 
mRNAs prominent uniquely in eukaryotic cells, but they also 
coordinate to play a critical role in eukaryotic translational ini- 
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Figure 4 The cap and tail structure of eukaryotic mRNA. 



tiation; the 5' cap and 3' poly(A) interact in the cytoplasm, ef- 
fectively circularizing the RNA. 

Other Factors 

There are likely to be many factors that control the proliferation 
of transposable elements of all types in eukaryotes. For example, 
organisms such as yeasts and F ugu, with smaller genomes, tend to 
have much higher recombination rates, and these organisms 
carry lighter transposon burdens. Ergo, it can be argued that such 
high recombination rates are inconsistent with explosive types of 
transposon amplification, such as has been seen in humans and 
maize. Very high transposon copy numbers could cause exten- 
sive secondary damage to highly recombination-proficient ge- 
nomes. Similarly, diverse mechanisms controlling the copy num- 
ber of certain transposons' activity, such as cosuppression of Tyl 
elements in yeast (Jiang 2002) and RNAi in many eukaryotes 
(Ketting et al. 1999), play important roles in controlling trans- 
poson copy numbers. Whereas such factors are undoubtedly of 
considerable importance in determining transposon copy num- 
bers in individual species, they do not help to explain the general 
trend observed that eukaryotes tend to have very high retrotrans- 
poson copy numbers relative to prokaryotes. 

The cap and tail hypothesis proposes that this unique ter- 
minal structure created three special molecular opportunities for 
the evolution of retrotransposons. First, these termini created a 
very stable long-lived genomic RNA freed from the necessity to 
be highly folded. Second, this RNA stability facilitated the recom- 
binational acquisition of additional host gene modules needed 
for the formation of retrotransposons much more likely; the long 
mRNAs typical of retrotransposons and retroviruses were pro- 
tected from destruction by exonucleases. Third, these terminal 
RNA structures provided precise punctuation marks defining the 
retrotransposon termini and facilitating their reproduction with- 
out the loss of even a single terminal nucleotide. These traits set 
the stage for the evolution of elaborate and precise processes of 
reverse transcription evolved by retrotransposons. 

Why Did Eukaryotes Evolve Caps and Tails? 
A number of theories have been advanced as to the evolution of 
the cap and tail. Extensive work on the molecular biology of 
translation has shown that the 5' cap and 3' tail structures are 
directly required for initiation of translation in eukaryotes. Ad- 
ditionally, both RNA structures are protective against terminal 
degradation of the RNA. In particular, the protective role of the 5' 
cap is revealed by the eukaryotic mRNA degradation pathway; 
this process occurs in three steps, (1) 3' deadenylation, leading to 
(2) decapping, followed by (3) 5' ■■> 3' exonuclease action (Tucker 
and Parker 2000). Although 3' exonucleases are found in eukary- 
otic cells, they appear to play more specialized roles in mRNA 



stability, such as nonstop decay and the destabilization of spe- 
cific mRNAs (van Hoof and Parker 2002). 

Polyadenylation occurs in all three kingdoms of life, al- 
though it only affects a subset of mRNAs in bacteria, and actually 
stimulates mRNA breakdown in prokaryotes (Steege 2000). Thus, 
certain components of the polyadenylation machinery predated 
the evolution of eukaryotes, and it appears that poly(A) simply 
acquired new functions in eukaryotes. In the literature and in 
discussions with colleagues, I've become aware of three theories 
regarding selective pressures leading to a need for a 5' cap. The 
first theory is that the compartmentalization of transcription 
leads to extensive opportunities for potentially inhibitory RNA 
folding prior to translation— potentially, such mRNA hairballs 
could occlude internal Shine Delgarno initiation sequences, 
whereas a terminal cap structure could more readily be recog- 
nized, like the end of a ball of yarn (Hershey and Merrick 2000). 
A second theory is that the complex nature of RNA processing in 
eukaryotes could lead to large numbers of misprocessed RNAs. 
Expression of inappropriately processed RNAs could lead to the 
expression of deleterious dominant negative protein fragments 
for example. Obviously, there are special pathways such as Non- 
sense-mediated decay (Frischmeyer and Dietz 1999; Gonzalez 
et al. 2001) and Nonstop decay, which deal with some of the RNA 
quality control issues raised by the existence of potentially inac- 
curate splicing machiner H (i i bird tv) fj 1 
is conferred by the obligatory circularization of mRNA during 
translational initiation— any RNA lacking intact 5 ' or 3 ' ends will 
not be translated (R. Green, pers. comm.). 

Finally, Stewart Shuman has proposed that the cap arose to 
protect the RNA from 5' exonuclease action, and that the latter 
activity represented a type of primitive immunity against RNA 
viruses (Shuman 2002). Thus, the Xrnlp 5' exonuclease may 
have arisen in response to genomic RNA invaders, and the cap- 
ping machinery evolved in parallel to protect endogenous cellu- 
lar mRNAs. It is clear that eukaryotic cells evolved a series of 
different immunity mechanisms against invading RNA genomes, 
including the interferon system (Kumar and Carmichael 1998) 
and RNAi (Ketting et al. 1999). Needless to say, if this scenario is 
correct, the primitive immunity conferred by 5' exonuclease was 
quickly evaded by viruses that acquired caps by various nefarious 
means or evolved IRES elements that bypassed the cap require- 
ment (Shuman 2002). Nevertheless, it would appear that the ac- 
quisition of the cap/5' exo strategy paradoxically set the stage for 
the evolution of a collection of internal genome invaders of eu- 
karyotes, and eventually, retroviruses. 

An interesting difference between bacteria and eukaryotes 
that may be related to differential RNA stability is the ability of 
eukaryotes to produce significantly longer proteins, such as the 
long polyproteins encoded by retrotransposons. Interestingly, a 
survey of bacterial genomes (Fig. S) shows that bacteria, on av- 
erage, encode shorter proteins than eukaryotes. Tins discrepancy 
becomes particularly acute when the longest ORFs are examined. 
The longest ORF in Escherichia coli K12, a putative invasin at 2383 
codons, is less than half the length of the longest Saccharamyoces 
cerevisiae ORF, the MDN1 gene at 49 1 0 codons, and pales in com- 
parison to human titin at 27,118 amino acids, encoded by an 
astonishingly long 82-kb mRNA (Labeit and Kolmerer 1995). This 
limit to ORF size does not represent an absolute expression block 
in bacteria, as some very large ORFs encoding nonribosomal 
polypeptide and polyketide synthases have been discovered in 
various bacteria! speeies. It is possible that the .simple- lifestyle of 
prokaryotes generally requires shorter proteins than the complex 
lifestyle of eukaryotes. The evolution of a more stable mRNA 
structure in eukaryotes may well have contributed to the evolu- 
tion of much greater potential protein structure complexity in 
general in eukaryotes. 
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ORF count 

Figure 5 Bacteria and Archaea encode smaller proteins than eukaryotes. The number of codons 
in each ORF for the indicated organisms was sorted in Excel, and a point was plotted for each 
protein. It can be seen readily that both the mean length and total length of the eukaryotic 
proteins are significantly higher than those of both the eubacterial and archaeal species Results are 
typical (data not shown). 



Bacteria Are RNA-Hostile 

Recent work on the degradation of bacterial mRNAs has eluci- 
dated the basic molecular mechanisms, which are quite different 
from the eukaryotic mechanism (Table 1). In summary, eubacte- 
ria like E. coli degrade their RNAs through the combined effects of 
multiple endonucleases and 3' exonucleases; many of the rel- 
evant activities are organized in degradosomes (Steege 2000). 
One of the well-studied endonucleases, RNAse E, nicks unstruc- 
tured RNA regions adjacent to structured regions. Whereas the 
products of such nicking are not necessarily excellent direct sub- 
strates for the 3' exonucleases, addition of a 3' poly(A) tail creates 
an opportunity to initiate the degradation process by the degra- 
dosome; hence, mRNA polyadenylation leads to degradation in 
eubacteria. 

If the cap and tail hypothesis is correct, it makes a number 
of predictions— for example, intact long RNA molecules should 
be difficult to detect in bacteria. It has long been known that it is 
extremely difficult to detect bacterial mRNAs by Northern blot- 
ting, and typical measurements of bacterial RNA half-lives range 
from seconds to minutes— far shorter than the half-lives of their 
eukaryotic counterparts, even when the mean mRNA half-life is 
adjusted for the cell generation time. (Fig. 6). Only a single value 
for average mRNA half-life is available from an Archaeal species, 
Sulfolubus solfataricus, which is among the slower-growing Ar- 
chaea (some Archaea have fast doubling times similar to those of 
eubacteria), and its RNA half-life value is intermediate between 
eubacteria and S. cerevisiae, a eukaryote with a relatively short 
mRNA half-life (Bini et al. 2002). Examination of the mRNA deg- 
radation components encoded in eubacterial, Archaeal, and eu- 
karyotic genomes shows that the eubacteria and Archaea share 
most of the same genes. Homologs of RNAse E, RNAse II, and 
polynucleotide phosphorylase are readily found by BLAST 
searching against Archaeal genomes. In contrast, Xrnlp ho- 
mologs and capping enzyme homologs are absent from eubacte- 
ria and Archaea, but are common to all eukaryotes (Ananthara- 
man et al. 2002). Furthermore, like eubacteria, Archaea organize 
at least some of their genes in operons, and use Shine-Delgarno 
sequences to guide ribosomes to their initiation sites, at least in 
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some mRNAs (Shuman 2002), suggesting 
that translational initiation mechanisms in 
Archaea are more similar to those in eubac- 
teria than in eukaryotes. 

Eubacterial Retroelements Have 
Small, Highly Structured RNAs 
With Occluded T Ends 

A second prediction of the cap and tail 
model is that those retroelements that are 
found in eubacteria and Archaea will exhibit 
genomic features suggestive of protection 
against RNA degradation, such as short 
length, extensive secondary structure, and 
occluded 3' ends. The two major classes of 
eubacterial retroelements display just these 
features. Retrointron RNA genomes are 
much shorter than retrotransposons and ret- 
roviruses, typically extending only 1-2 kb 
long versus 4-8 kb or more for typical retro- 
transposons and 10 kb or more for typical 
They are highly folded, their 5' 
occluded via a 2'-S' linkage and, 
re always found in the form 
highly specific RNP, in which the RT- 
protein is tightly bound to the in- 
tronic RNA. Importantly, the 3' terminus of 
these molecules consists of a series of Wat- 
son Crick base pairs at the base of the domain VI stem of the 
intron, followed by two or three unpaired bases that can form a 
tertiary interaction with an internal segment in the intron (7/7' 
sequences; Bonen and Vogel 2001). Similarly, the retron genome 
consists of a small, highly folded molecule in which the 3' end of 
the RNA component is base paired to the 3' end of the DNA 
component (Yamanaka et al. 2002). 

Retrotransposon RNAs Are Capped and Polyadenylated 

Nearly all retrotransposon RNAs contain caps and poly(A) tails, 
as do retroviral RNAs. The case is quite clear for LTR retrotrans- 
posons and retroviruses; there are many reports of poly(A) at the 
3' ends of LTR retrotransposon RNAs, and further evidence for 
posttranscriptionally added 3' poly(A) tails in LTR retrotrans- 
posons can be found readily in EST databases. Capping is more 
laborious to evaluate, but some studies have been performed; for 
example, Tyl mRNA was examined directly and found to be 
capped (Mules et al. 1998b), as are retroviral RNAs. Because LTR 
retrotransposons encode proteins required for their own mobil- 
ity, and these must be translated from their mRNAs, it is ex- 
tremely likely that all LTR-retrotransposon RNAs are capped. 
One of the tr - -* ' 
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rotransposons is that the vast majority of these elements actually 
encode poly(A) in their DNA. This 3' poly(A) tract defines the 
element's 3' end; many studies suggest that the 3' poly(A) tract 
defines the site at which reverse transcription (TPRT) initiates 
(Moran et al. 1996). These poly(A) tails arc peculiar in that they 
are apparently synthesized, at least in part, by RNA polymerase 
rather than the conventional polyadenylation machinery. How- 
ever, it is possible that the 3' poly(A) residues might be added 
post-transcriptionally using conventional polyadenylation. 
There are a few non-LTR retrotransposons such as the Drosopliila 
I factor, which terminate not in poly(A), but in a related se- 
quence, (TAA) n . Clearly, the 3' end of I factor RNA is not formed 
by conventional polyadenylation, but by transcription. Never- 
theless, the number of TAA repeats can increase during retro- 
transposition, suggesting that a mechanism other than conven- 
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Table 1. RNA Degradation Componen 



n the Three Kingdoms of Life 



RNAse C (rng) 

RNAse III (rnc) 

RNAse II (rnb) 
Polynucleotide 
Phosphorylase (pnp) 
Oligoribonuclease 
(orn) 

Eukaryotic 

Deadenylase 

(COM) 
Decapping enzyme 

(DCP2) 
Exonuclease (XRN1) 
Exosome (multiple) 



tranded DNA adjacent to structured regions; provides 
i adjacent to structured regions; provides 



3-5' Exo- 
3 '-5' Exor 



degradosome c< , 
release; cuts double-stranded DNA; provides internal access points for 
)some components 
ibonuclease 



Removes nucli 
2002) 

Removes 5' cap from deaenylated mRNA 



from 3' polyA during translation (Chen et al. 2002; fucker el 



Eubacteria, Archaea 

(Eukarya — weak, exosome) 
Eubacteria, Archaea (weak) 

(Eukarya, weak) 
Eubacteria, Archaea 

some homologs in Eukarya 
Eubacteria 
Eubacteria, Archaea 

some homologs in Eukarya 
Eubacteria, Archaea 



Eukarya 



"Taken from Anantharaman et al. (2002) 



tional polyadenylation leads to the lengthening of the element 3' 
end, probably slippage by the I factor RT (Pritchard et al. 1988). 
Interestingly, the I factor 3' sequence can be replaced with 
poly(A), and the modified elements produce progeny elements 
with 3' poly(A) tails (Chambeyron et al. 2002). Intriguingly, a 
significant subset of human LI elements carry a related (TAAA) n 
repeat in place of poly(A) (Szak et al. 2002). There are a few 
non-LTR retrotransposons, such as the CR1 element that termi- 
nate in a 3' terminal-repeated sequence unrelated to poly(A) 
(Burch et al. 1993). Presumably, these mRNAs have found an- 
other way to be circularized during translation, as they must be 
translated. Because this type of element lacking a poly(A)-like 
sequence is rare, I would propose that this is some late evolu- 
tionary adaptation. Clearly, the ancestral state of this family of 
elements is a 3' poly(A )tail. 

Capping, however, has not been directly studied in the non- 
LTR retrotransposons, although the similarity of these elements' 
RNAs to mRNA strongly suggests that they are capped. There is 
evidence that the Drosophila jockey non-LTR retrotransposon is 
transcribed by RNA polymerase 11, which is that its mRNA syn- 
thesis is a-amanitin sensitive (Mizrokhi et al. 1988). All known 
pol II mRNAs are capped, therefore, non-LTR mRNAs are unlikely 
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E. coli S.solfataricus S. cerevisiae H. sapiens 



Figure 6 Average mRNA half lives of diverse organisms, adjusted for 
generation time. The data are plotted directly from Bini et al. (2002). 



to be exceptions to this rule. Finally, the sequence of the human 
LI element provides presumptive evidence for capping. Previous 
in vitro studies have shown that various RTs can readily copy the 
G residue comprising the cap, in spite of its unusual 5-5' tri- 
phosphate linkage to the mRNA (Hirzmann et al. 1993; Volloch 
et al. 1995; Mules et al. 1998a). The LI sequence starts with a run 
of a variable number G residues, which I propose has accumu- 
lated through multiple rounds of cap reverse transcription. In 
support of this exotic idea, the majority of extra single nucleo- 
tides accumulated at the 5' junction of experimentally isolated 
new full-length LI insertions are G residues, whereas truncated 
LI elements do not prefer single-G insertions (Symer et al. 2002; 
N. Gilbert, S.L. Lutz-Prigge, andJ.V. Moran, pers. comm.). 

A final exception to the general rule that eukaryotic retro- 
transposons are capped and polyadenylated is also instructive 
and supports the model, namely, the case of the Ahi element and 
the related SINEs. These unusual elements don't need a cap, be- 
cause they are not translated, but rely on retrotransposition pro- 
teins encoded by other non-LTR retrotransposons. Intriguingly, 
these elements are polyadenylated through transcription, even 
though they are transcribed by RNA polymerase III and, hence, 
are extremely unlikely to interact with the polyadenylation ap- 
paratus. However, these pol III transcripts lack a 5' cap. A differ- 
ent mechanism of protection from 5' exonuclease is adopted by 
these elements; as in" the case of eubacterial retroelement 3' ends, 
Alu and related retroelements, as well as the tRNA-derived retro- 
elements, are also highly folded and the 5' end of the RNA is 
always found in an extensively base-paired structure (e.g., see 
Sinnett et al 1992), which would protect it against Xrnl-Iike 5' 
exonucleases. 

Retrotransposon RNA levels are highly variable and tend to 
be tissue specific in metazoans, with high levels reached only in 
the germ line in most cases (Chaboissier et al. 1990; Branciforte 
and Martin 1994). Naturally, the abundance of retrotransposon 
RNAs is very strongly correlated with retrotransposition fre- 
quency. Because retrotransposition frequencies are set by some 
complex evolutionary interplay unique to each host/retrotrans- 
poson combination, it is not surprising that there is great vari- 
ability in retrotransposon RNA levels. Nevertheless, there are 
some very dramatic cases of very high retrotransposon RNA lev- 
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els that provide strong evidence that the cap and tail structure are 
compatible with high levels of retrotransposon transcript stabil- 
ity. Of note, the Drosophila retrotransposon copia is so named 
because of its incredibly copious mRNA (Young and Hogness 
1977), and yeast Ty 1 mRNA levels are among the most abundant 
in the yeast cell (Curcio et al. 1990), with Tyl mRNA visible as a 
discrete band in poly(A)-selected RNA preparations. 

In conclusion, the stable and well-punctuated mRNA system 
was probably critical in allowing eukaryotes to evolve an ever 
more complex lifestyle, permitting longer more complex pro- 
teins and increased molecular diversity through alternative splic- 
ing. This same key change probably led to the extensive prolif- 
eration of retroelements, including retroviruses, in the many 
complex guises in which they are found today. 
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mine. A single dose of clozapine increases 
dopamine release in the primate prefrontal 

creases basal extracellular dopamine con- 
centration in the prefrontal cortex (21). 
Although this may not be the only mecha- 
nism by which clozapine elicits its effects on 
PCP-induced cognitive dysfunction, this 
activation of the dopamine system of the 
prefrontal cortex may contribute to the 
ability of clozapine to ameliorate the im- 
pairments in our model and, perhaps, in 
schizophrenia. 

Our data show that repeated administra- 
tion of PCP inhibits basal and stimulated 
dopaminergic function in the prefrontal 
cortex of the monkey brain. The deficiency 
of dopamine in the prefrontal cortex that is 
induced by repeated administration of PCP 
is associated with a long-lasting cognitive 
deficit, which is ameliorated by the atypical 
therapeutic drug clozapine. These effects 
are observed long after PCP administration 
is stopped and thus cannot be attributed to 
direct effects of the drug. This primate mod- 
el of dopamine dysfunction in the cortex 
may provide a paradigm for investigating 
the pathophysiology underlying neuropsy- 
chiatric disorders associated with a primary 
cognitive dysfunction in the cortex and a 
dopaminergic deficit in the prefrontal cor- 
tex, as is hypothesized in schizophrenia 
(22). It also may provide a means for eval- 
uating therapeutic agents that are selective- 
ly targeted toward alleviating cortical dopa- 
mine hypofunction. 
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tected and telomeres shorten with succes- 
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about 85% of human tumors, which has led 
to studies of the usefulness of telomerase for 
cancer diagnostics and therapeutics (5, 6). 

Telomerase RNA subunits have been 
identified and analyzed in ciliates, yeast 
and mammals (2, 7), but the protein sub- 
units have been elusive. In Tetrahymena, 
two telomerase-associated proteins (p80, 
p95) have been described (8), and p80 ho- 
mologs have been found in humans and 
rodents (9); the presence of catalytic active 
site residues in these proteins has not been 



Telomerase Catalytic Subunit Homologs from 
Fission Yeast and Human 
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Catalytic protein subunits of telomerase from the ciliate Euplotes aediculatus and the 
yeast Saccharomyces cerevisiae contain reverse transcriptase motifs. Here the homol- 
ogous genes from the fission yeast Schizosaccharomyces pombe and human are iden- 
tified. Disruption of the S. pombe gene resulted in telomere shortening and senescence 
and expression of mRNA from the human gene correlated with telomerase activity in cell 
lines. Sequence comparisons placed the telomerase proteins in the reverse transcriptase 
family but revealed hallmarks that distinguish them from retroviral and retrotransposon 
relatives. Thus, the proposed telomerase catalytic subunits are phylogenetically con- 
served and represent a deep branch in the evolution of reverse transcriptases 
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established. Purification of telomerase from 
the ciliate Euplotes aediculatus yielded two 
proteins, pi 23 and p43 {10), that appear 
unrelated to P 80 and p95 (11). pi 2.3 con- 
tains reverse transcriptase (RT) motifs and 
is homologous to yeast Est2 (Ever shorter 
telomeres) protein (11), which is essential 
for telomere maintenance in vivo (12). The 
RT motifs of Est2p are essential for telo- 
meric DNA synthesis in vivo and in vitro 
(11, 13), supporting the conclusion that 
Est2p and pi 23 are the catalytic subunits of 
telomerase. The question remained whether 
there are two telomerases in biology, one 
based on p80- and p95-like proteins and 
one on pl23/Est2p-like proteins. 

To determine if Est2p/pl23 is conserved 
among eukaryotes, we searched for ho- 
mologs in the fission yeast S. pombe and 
humans. Polymerase chain reaction (PCR) 
amplification of S. pombe DNA was carried 
out with degenerate-sequence primers de- 
signed from the Euplotes pi 23 RT motifs B' 
and C. Of the four prominent products 
generated, the ~120-base pair (bp) band 
encoded a peptide sequence homologous to 
pl23 and Est2p. Using this PCR product as 
a probe for colony hybridization, we identi- 
fied two overlapping clones from a genomic 
library and three from a cDNA library (14). 
None of the three cDNA clones was full 
length, so we used RT-PCR to obtain the 
NH 2 -terminal sequences (15). This puta- 
tive telomerase reverse transcriptase gene, 
trcl + , encoded a basic protein with a pre- 
dicted molecular mass of 116 kilodaltons 
(kD) (Fig. 1A). The sequence similarity to 
pi 23 and Est2p was especially high in the 
seven RT motifs (Table 1) and in motif T 
(Telomerase-specific) (Fig. 2). Fifteen in- 
trons, ranging from 36 to 71 bp, interrupted 
the coding sequence. All had consensus 
splice and branch site sequences (16). 

If trtl + encodes the telomerase catalytic 
subunit in S. pombe, deletion of the gene 
would be expected to result in telomere 
shortening and perhaps cellular senescence 
as seen with the est2 mutants in S. cerevisiae 
(11, 13). To test this, we created two dele- 
tion constructs (Fig. 1A), one removing 
motifs B' through E in the RT domain, and 
the second deleting 99% of the open read- 
ing frame (ORF). Haploid cells grown from 
both types of spores showed progressive 
telomere shortening to the point where hy- 
bridization to telomeric repeats became al- 
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most undetectable (Fig. IB). Senescence 
was indicated by (i) reduced ability of the 
cells to grow on agar, typically by the fourth 
streak-out after germination; (ii) the ap- 
pearance of colonies with increasingly 
ragged edges (Fig. 1C); and (iii) the increas- 
ing fraction of elongated cells (Fig. ID). 
When individual enlarged 



eny. The telomerase-negative si 



of t. 



rated o 



the disse, 



jority underwent no further division. The 
same trtl ~ cell population always contained 
normal-size cells that continued to divide 
but frequently produced nondividing prog- 



as documented in budding 
yeast strains with deletions of telomere - 
replication genes (12, 17). 

A candidate human pl23/Est2p/Trtlp 
homolog was identified by a BLAST search 
of the EST (expressed sequence tag) data- 
base (GenBank AA281296). This EST was 
the top-ranked match in sequence searches 
with Euplotes pl23 (P = 3.3 X 1(T 6 ) and S. 
pombe Trtlp (P = 9.7 X 1(T 7 ). The human 
EST was not found in searches with yeast 



— Original PCR Product 




trt1+ 



trtl- 



Fig. 1. The gene for the S. pombe telom! 
protein and phenotypes associated with its 
tion . (A) The trt 1 ' locus, the location of the - 
bp PCR product that led to its identification, and the regions replaced by ura4 + or his3* genes in the 
trtl mutants (K, Kpn i; Xb, Xba I; H, Hind III; Xc, Xoa I; Xh, Xho I). (B) Telomere shortening phenotype 
oltrtl mutants. A trt1 + /trt1 diploid (28) was sporulated and the resulting tetrads were dissected 
and germinated on a YES (Yeast Extract medium Supplemented with amino acids) plate (29). Colonies 
derived from each spore were grown at 32°C for 3 days, and streaked successively to fresh YES plates 
every 3 days. A colony from each round was placed in 6 ml of YES liquid culture at 32 D C and grown to 
stationary phase, and genomic DNA was prepared. After digestion with Apa I, DNA was subjected to 
electrophoresis on a 2.3% agarose gel, stained with ethidium bromide to confirm approximately equal 
loading in each lane, transferred to a nylon membrane, and hybridized to a telomeric DNA probe. The 
Apa I site is located 30 to 40 bp away from telomeric repeat sequences in chromosomes I and II. (C) 
Colony morphology of trtV and trtl ' cells. Cells plated on MM [Minimal Medium (29) with glutamic acid 
substituted for NH„CI] were grown for 2 days at 32°C prior to photography. (D) Microqraphs of frf 1 * and 
trtl ~ cells grown as in (C). 
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Est2p, but subsequent pairwise comparison 
of these sequences showed a convincing 
match. Sequencing of the rest of the 
cDNA clone containing the EST revealed 
all eight TRT (Telomerase Reverse Tran- 
scriptase) motifs, but not in a single ORF 

(18) . We used the sequence information 
from this incomplete cDNA clone to iso- 
late an extended cDNA clone from a li- 
brary of 293 cells, an adenovirus El-trans- 
formed human embryonic kidney cell line 

(19) . This cDNA clone (pGRN121) had a 
182-bp insert relative to the EST clone, 
which increased the spacing between mo- 
tifs A and B' (/8) and put all seven RT 
motifs and the telomerase-specific motif T 
motifs in a single contiguous ORF (Fig. 2). 



RT-PCR amplification of RNA from 293 
cells and from testis each gave two prod- 
ucts differing by 182 bp (20). The larger 
and smaller products from testis RNA 
were sequenced and found to correspond 
exactly to pGRN121 and the EST cDNA, 
respectively. 

The relative abundance of hTRT mRNA 
was assessed in six telomerase-negative mortal 
cell strains and six telomerase-positive immor- 
tal cell lines (21) (Fig. 3). The steady-state 
level of hTRT mRNA was higher in immortal 
cell lines with active telomerase (6) than in 
any of the telomerase-negative cell strains 
tested. Telomerase activity was more strongly 
correlated with the abundance of hTRT 
mRNA than with that of telomerase RNA 



(hTR) (7). In contrast, the abundance of 
mRNA for the human p80 homolog TP1 (9) 
did not correlate with telomerase activity 
(Fig. 3). Thus, while our proposal that hTRT 
is the catalytic subunit of human telomerase is 
based mainly on protein structural features 

Table 1. Amino acid sequence identity between 
telomerase reverse transcriptases. Each value is 
% identity (% similarity in parentheses) based on 
RT motifs 1 , 2, and A through E in Fig 2C. 
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Fig. 2. Structure and RT sequence motifs of telomerase proteins. (A) Loca- 
tions of telomerase-specific motif T and conserved RT motifs 1 , 2. and A 
through E (24) are indicated by colored boxes. The open rectangle labeled 
HIV-1 (Human Immunodeficiency Virus) RT delineates the portion of this 
protein shown in (B). pi, isoelectric point. (B) The crystal structure of the p66 
subunit of HIV-1 RT (Brookhaven code 1HNV). Color-coding of RT motifs 
matches that in (A). The view is from the back of the right hand, which allows 
all motifs to be seen. (C) Multiple sequence alignment of telomerase RTs 
and members of other RT families (Sc_al, cytochrome oxidase group II 
intron 1 -encoded protein from S. cerevisiae mitochondria; Dm_TART, re- 
verse transcriptase from Drosophila melanogaster TART non-LTR re'tro- 
transposable element). Boldface residues indicate identity of at least three 
telomerase sequences in the alignment. Colored residues are highly con- 
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served in all RTs and shown as space-filled residues in (B). The number of 
amino acids between adjacent motifs or to the end of the polypeptide is 
indicated. TRT con and RT con, consensus sequences for telomerase RTs 
(this work) and non-telomerase RTs (24) (amino acids are designated h 
hydrophobic, A, L, I, V, P, F, W, M; p, polar, G, S, T, Y, C, N, Q; c, charged, D, 
E, H, K, R). Red arrowheads show some of the systematic differences be- 
tween telomerase proteins and other RTs. Red rectangle below motif 
E highlights the primer grip region discussed In the text. Abbreviations for the 
amino acids are as follows: A, Ala; C, Cys; D, Asp; E, Glu; F, Phe; G, Gly; H, 
His; I, He; K, Lys; L, Leu; M, Met; N, Asn; P, Pro; Q, Gin; R, Arg; S, Ser; T, Thr; 
V, Val; W, Trp; and Y, Tyr. The nucleotide sequences of the S. pombe trt1 ' 
gene and the human TRT cDNA (pGRN 1 21 ) have been deposited in GenBank 
' AF015783 and AF01 5950, respectively). 
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(similarity of RT motifs, the T motif, molec- 
ular mass > 100 kD, pi > 10), the correlation 
of its 111RNA expression level with activity 
also supports this conclusion. 

Sequence alignment of the four telom- 
erase genes revealed features similar to oth- 
er reverse transcriptases, as well as differ- 
ences that serve as hallmarks of the telom- 
erase subgroup. The new T motif is one 
telomerase-specific feature not found in the 
other RTs examined. Another is the dis- 
tance between motifs A and B', which is 



Fig. 3. Expression of hTRT in telomerase-negative 
mortal cell strains (lanes 1 to 6) and telomerase- 
positive immortal cell lines (lanes 7 to 12). RT-PCR 
(27) for hTRT, hTR (human telomerase RNA com- 
ponent), TP1 (telomerase-associated protein re- 
lated to Tetrahymena p80), and GAPDH (to nor- 
malize for equal amounts of RNA template) was 
carried out on RNA from: (1) human fetal lung 
fibroblasts GFL, (2) human fetal skin fibroblasts 
GFS, (3) adult prostate stromal fibroblasts 31 YO, 
(4) human fetal knee synovial fibroblasts HSF, (5) 
neonatal foreskin fibroblasts BJ, (6) human fetal 
lung fibroblasts IMR90, (7) melanoma LOX IMVI (8) 
leukemia U251 , (9) NCI H23 lung carcinoma, (10) 
colon adenocarcinoma SW620, (11) breast tumor 
MCF7, and (12) human 293 cells. 



Fig. 4. A possible phylogenetic tree 
of telomerases and retroelements 
rooted with RNA-dependent RNA 
polymerases. After sequence align- 
ment of motifs 1, 2, and A through E 
(178 positions, Fig. 2C) from four 
TRTs, 67 RTs, and three RNA poly- 



using the Neighbor Joining method 
(30). Elements from the same class 
that are located on the same 
branch of the tree are simplified as a 
box. The length of each box corre- 
sponds to the most divergent ele- 
ment within that box. 



longer in the TRTs than in other RTs (Fig. 
2A). These amino acids can be accommo- 
dated as an insertion within the "fingers" 
region of die structure that resembles a 
right hand (22, 23) (Fig. 2B). Within the 
motifs, there are a number of substitutions 
of amino acids (red arrowheads in Fig. 2C) 
that are highly conserved among the other 
RTs. For example, in motif C the two as- 
partic acid residues (DD) that coordinate 
active site metal ions (22) occur in the 
context hxDD(F/Y) in the telomerase RTs 
compared to (F/Y)xDDh in the other RTs 

(24) . Another systematic change character- 
istic of the telomerase subgroup occurs in 
motif E, where WxGxSx appears to be the 
consensus among the telomerase proteins, 
whereas hLGxxh is characteristic of other 
RTs (24). This motif E is called the "primer 
grip" (23), and mutations in this region 
affect RNA priming but not DNA priming 

(25) . Because telomerase uses a DNA prim- 
er, the chromosome 3' end, it is not unex- 
pected that it should differ from other RTs 
in this region. Given that the simple 
change from Mg 2 " to Mn 2+ allows HIV RT 
to copy a small region of a template in a 
repetitive manner (26), it is tempting to 
speculate that some of the distinguishing 
amino acids in the TRTs may cause telom- 
erase to catalyze repetitive copying of the 
template sequence of its tightly bound 
RNA subunit. 

Using the seven RT domains (Fig. 2C) 
defined by Xiong and Eickbush (24), we 
constructed a phylogenetic tree that in- 
cludes the four telomerase RTs (Fig. 4). The 
TRTs appear to be more closely related to 
RTs associated with msDNA (multicopy 
single-stranded DNA), group II introns, 
and non-LTR (Long Terminal Repeat) ret- 
rotransposons than to the LTR-retrotrans- 
poson and viral RTs. The relationship of 
the telomerase RTs to the non-LTR branch 
of retroelements is intriguing, given that 
the latter elements have replaced telomer- 
ase for telomere maintenance in Drosophila 



(27). However, the most striking finding is 
that the TRTs form a discrete subgroup, 
about as closely related to the RNA-depen- 
dent RNA polymerases of plus-stranded 
RNA viruses such as poliovirus as to retro- 
viral RTs. In view of the fact that the four 
telomerase genes are from evolutionarily 
distant organisms — protozoan, fungi, and 
mammals — this separate grouping cannot 
be explained by lack of phylogenetic diver- 
sity in the data set. Instead, this deep 
branching suggests that the telomerase RTs 
are an ancient group, perhaps originating 
with the first eukaryote. 

The primary sequence of hTRT and 

merase may be used to discover telomerase 
inhibitors, which in turn will permit addi- 
tional testing of the anti-tumor effects of 
telomerase inhibition. The correlation be- 
tween hTRT itiRNA levels and human 
telomerase activity shown here indicates 
that hTRT also has promise for cancer di- 
agnosis. With an essential protein compo- 
nent of telomerase now in hand, the stage is 
set for more detailed investigation of fun- 
damental and applied aspects of this ribo- 
nucleoprotein enzyme. 
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Contrasting Genetic Influence of CCR2 and 
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The critical role of chemokine receptors (CCR5 and CXCR4) in human immunodeficiency 
virus-type 1 (HIV-1) infection and pathogenesis prompted a search for polymorphisms 
in other chemokine receptor genes that mediate HIV-1 disease progression. A mutation 
{CCR2-64I) within the first transmembrane region of the CCR2 chemokine and HIV-1 
receptor gene is described that occurred at an allele frequency of 1 0 to 1 5 percent among 
Caucasians and African Americans. Genetic association analysis of five acquired im- 
munodeficiency syndrome (AIDS) cohorts (3003 patients) revealed that although CCR2- 
641 exerts no influence on the incidence of HIV-1 infection, HIV-1 -infected individuals 
carrying the CCR2-64I allele progressed to AIDS 2 to 4 years later than individuals 
homozygous for the common allele. Because CCR2-64I occurs invariably on a CCR5- 
+-bearing chromosomal haplotype, the independent effects of CCR5-A32 (which also 
delays AIDS onset) and CCR2-64I were determined. An estimated 38 to 45 percent of 
AIDS patients whose disease progresses rapidly (less than 3 years until onset of AIDS 
symptoms after HIV-1 exposure) can be attributed to their CCR2-+/+ or CCR5-+/+ 
genotype, whereas the survival of 28 to 29 percent of long-term survivors, who avoid 
AIDS for 16 years or more, can be explained by a mutant genotype for CCR2 or CCR5. 
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The nexus of chemokine immunobiology 
and AIDS pathogenesis has revealed un- 
tapped avenues for resolving patterns of 
HIV-1 disease progression, for clarifying 
epidemiologic heterogeneity, and for design 
of therapies (i-6). Identification of the 
CC-chemokines, RANTES, MlPla and 
MlPip, as suppressor factors produced by 
CD8 cells that counter infection by certain 
HIV-1 strain infections (7) previewed the 
critical identification of two chemokine re- 
ceptor molecules, CXCR4 (formerly named 
LESTR/fusin) and CCR5 (formerly CKR5), 
as cell surface coreceptors with CD4 for 
HIV-1 infection (8-13). Additional che- 
mokine receptors CCR2 and CCR3 also 



have been implicated as HIV-1 coreceptors 
on certain cell types (12-14). HIV-1-in- 
fected patients harbor predominantly mac- 
rophage-tropic HIV-1 isolates during early 
stages of infection, but accumulate increas- 
ing amounts of T cell-tropic strains just 
before accelerated T cell depletion and pro- 
gression to AIDS. The identification of 
"dual"-tropic HIV-1 strains over the course 
of infection suggests that such strains may 
represent an intermediate between macro- 
phage- and T cell-tropic populations (II- 
13, 15). This tropic transition indicates that 
viral adaptation from CCR5 to CXCR4 
receptor use may be a key step in progres- 
sion to AIDS (16). 
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Telomerase catalysis: A phylogenetically conserved 
reverse transcriptase 

Victoria Lunclblad 

Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030 



Replication of telomeres, the ends of eukaryotic chromo- 
somes, is the responsibility of the enzyme telomerase. Since its 
discovery 13 years ago, research on this unusual DNA poly- 
merase has revealed a series of surprises. The first of these was 
the realization that information within the enzyme itself 
determines the sequence of its product: a portion of a telom- 
erase RNA subunit is the template that dictates the nucleotides 
added onto the telomere (1, 2). The interest in telomere 
replication has increased further during the past several years 
because of observations indicating that maintenance of telo- 
mere length by telomerase could provide the molecular basis 
for determining the lifespan of cells in culture (3). 

The most recent insight has been the discovery that telom- 
erase is a reverse transcriptase (4, 5), with catalysis provided 
by a protein subunit with striking similarities to conventional 
reverse transcriptases. The genes encoding the TERT (for 
telomerase reverse transcriptase; reviewed in ref. 6) proteins 
have been recovered from a diversity of species. Each of these 
proteins exhibits sequence features previously observed in 
reverse transcriptases, as well as a telomerase-specific T motif 
(ref. 7 and references therein). Mutation of key residues 
predicted to be critical for catalysis (by comparison to the 
reverse transcriptase active site) abolishes telomerase activity 
in yeast and humans (4, 5, 8), and expression of the human 
RNA and TERT subunits in an in vitro translation system is 
sufficient to reconstitute activity (8, 9). Although reliance on 
a templating RNA component had already suggested parallels 
between this enzyme and other RNA-dependent DNA poly- 
merases, the demonstration that the telomerase catalytic sub- 
unit exhibited structural and enzymatic similarities to conven- 
tional reverse transcriptases provided direct mechanistic sup- 
port for these comparisons. This result also has striking 
implications for reverse transcriptase evolution (10, 11), by 
demonstrating that such enzymes are not employed solely for 
replication of parasitic genetic elements but are also necessary 
for normal cellular proliferation. Furthermore, the cloning of 
the human TERT (hTERT) subunit permitted a direct test of 
the hypothesized role of telomere replication in the lifespan of 
normal human cells in culture: ectopic expression of hTERT 
in telomerase-negative human diploid fibroblasts restored 
enzyme activity and conferred an ability to proliferate well 
beyond the normal senescence point (12). 

In this issue of the Proceedings, one aspect of telomerase 
research has come full circle, with the cloning of the TERT 
protein from the ciliate Tetrahymena (7, 13), the source of the 
first discovered telomerase activity. The ciliated protozoa have 
contributed greatly to our understanding of telomere biology 
because of an unusual feature of ciliate development. During 
the formation of a new macronucleus after mating, de novo 
addition of telomeres occurs on the ends of hundreds of 
thousands of newly formed minichromosomes (14). Thus, 
ciliales such as Tetrahymena, Euplotes, and Oxytricha have 
proven to be rich sources of the factors required for telomere 



1 The National Academy of S 958415 ZS.200/0 

PNAS is available online at http://www.pnas.org. 



replication and maintenance. As a consequence, the first two 
telomerase-associated proteins, p80 and p95, were identified 
after purification of the Tetrahymena telomerase complex (15). 
On the basis of limited sequence similarities with other poly- 
merases, p95 was proposed to contain the catalytic active site 
of this enzyme (15). However, p95 showed no homology to the 
emerging family of TERT proteins. This presented a potential 
puzzle, invoking the possibility of an alternative class of 
telomerase enzymes that utilized a different catalytic mecha- 

This possibility has now been laid to rest by two reports in 
'l,i i ie, from the Cech and Collii ri h i hat 

the Tetrahymena telomerase relies on a reverse transcriptase 
subunit for catalysis (7, 13). The gene encoding the Tetrahy- 
mena TERT protein was cloned by using a molecular ap- 
proach, and the predicted protein displayed both the expected 
seven reverse transcriptase motifs and the T motif. Expression 
of this TERT protein and the RNA subunit in reticulocyte 
extracts was sufficient to reconstitute polymerization activity 
(although the high processivity observed with native enzyme 
was not achieved with this reconstituted core complex), and 
catalysis by the reconstituted enzyme was abolished by muta- 
tions similar to those previously tested for yeast and human 
telomerases (13). The Tetrahymena catalytic subunit protein 
also could be coimmunoprecipitated with p95, p80, and the 
RNA subunit, arguing that all four components are present in 
a single complex (13). Expression of the TERT mRNA was 
observed to increase dramatically after mating (7), consistent 
with the greatly increased requirements for telomere addition 
during macronuclear development. 

Thus, it is now clear that the well characterized Tetrahymena 
enzyme has a catalytic subunit that shows both structural and 
evolutionary conservation with other telomerases. This result 
argues for the conceptually satisfying view that telomerase, 
regardless of its source, has a phylogenetically conserved core 
that minimally consists of the RNA component and a TERT 
protein. However, this conservation does not hold up as well 
when potential holoenzyme-associated proteins, isolated in 
several different systems, are compared. The telomerase re- 
verse transcriptase subunit was discovered and characterized 
as a result of parallel biochemical and genetic endeavors in 
Euplotes and Saccharomyces cerevisiae; these efforts also led to 
the identification of telomerase-associated proteins in both 
species. Extensive purification of an active enzyme from 
Euplotes aediculatus resulted in a complex consisting of the 
RNA and catalytic protein subunits, as well as a copurifying 
43-kDa protein; however, p80- and p95-like proteins were not 
detected in this complex (16). An alternative strategy for the 
identification of yeast telomerase components relied on a 
screen for mutants of yeast that exhibited an in vivo telomere 
replication defect (17, 18). This uncovered EST2, the yeast 
homolog of the telomerase reverse transcriptase gene, as well 
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as additional EST genes that, when mutated, exhibited a 
mutant phenotype identical to that displayed by mutations in 
the core enzyme complex. Estl, Est2, and Est3 proteins each 
coimmunoprecipitate with yeast telomerase activity (ref 19- 
T. R. Hughes, R. G. Weilbaecher, and V.L., unpublished 
observations), suggesting that Estl and Est3 could be compo- 
nents of the yeast holoenzyme. However, the yeast enzyme has 
not been purified sufficiently to be able to assess whether all 
three Est proteins are present in a single complex with the 
telomerase RNA. Strikingly, the telomerase-associated Est 
proteins show no similarity to any of the ciliate proteins. 
Furthermore, a search of the completely sequenced yeast 
genome has not revealed any homologs of the Tetrahymena p80 
and p95 proteins. 

So what is the explanation for the lack of convergence 
between these different sets of telomerase-associated pro- 
teins? Several hypotheses come to mind, which are not nec- 
essarily exclusive. The first possibility is that telomerase may be 
a large holoenzyme with a number of associated proteins, and 
efforts in these three organisms have succeeded in identifying 
only a partial subset. In support of this possibility, human, rat, 
and murine homologs of p80 have been isolated (20, 21), and 
the human p80 homolog has been shown to be in a complex 
with the hTERT subunit of telomerase (22), consistent with a 
similar demonstration for the equivalent Tetrahymena proteins 
(13). This finding excludes the possibility that p80 is a ciliate- 
specific telomerase protein and also raises the expectation that 
a similar mammalian p95 homolog may follow. This cross- 
species conservation does make the lack of a recognizable 
yeast version even more puzzling, but perhaps yeast homologs 
may not be readily identified on the basis of primary sequence. 
In fact, one proposed set of orthologs may be p95 and the yeast 
Estl protein, as these two telomerase-associated proteins have 
a similar set of in vitro biochemical properties. Both proteins 
exhibit low-affinity, but sequence-specific, binding to single- 
strand telomeric DNA substrates (23, 24). In addition, both 
proteins interact, albeit nonspecifically, with RNA in vitro (23, 
24). Such properties argue for roles in recognition of the 
telomeric DNA substrate and interaction with the telomerase 
RNA. 

Alternatively, the diversity of telomerase-associated pro- 
teins may be a reflection of the differing requirements faced 
by telomerase in various biological situations, such that al- 
though the enzyme core may be conserved, at least a subset of 
the proteins that associate with the holoenzyme will be species- 
specific. One obvious species difference that may be mediated 
by components of the telomerase holoenzyme is the substan- 
tial variation in telomere length, ranging from 50 bp for some 
cihates to >50 kb in one species of mouse. In addition, an 
enzyme that is responsible for de novo telomere addition as a 
part of chromosome healing might be expected to have dif- 
fering cofactor requirements than an enzyme complex that is 
responsible for telomere length maintenance during vegetative 
growth. Even within a single species (Euplotes crosses), bio- 
chemical differences have been noted between telomerase 
isolated from vegetatively growins cells versus enzyme from 
mated cells (25). 
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Thus, although the basis for the differences in holoenzyme 
composition between different species is not yet understood, 
the results of the last year have shown that a phylogenetically 
conserved TERT protein is common to all telomerases. This 
has provided specific insight into the mechanism of telomerase 
catalysis, as well as establishing a sound foundation for a future 
detailed understanding of the composition of the telomerase 
holoenzyme. With four components of the most thoroughly 
studied telomerase now identified and available, Tetrahymena 
once again establishes itself as a system that will contribute 
important information about the biochemical activities of the 
components of this unusual DNA polymerase. 
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The gypsy group of long-terminal-repeat retrotransposons contains elements having 
the same order of enzyme domains in the /w/gene as do retroviruses. Elements in 
the gypsy group are now known from yeast, filamentous fungi, plants, insects, and 
echinoids. Reverse transcriptase and RNase H amino acid sequences from elements 
in the gypsy group— including the recently described SURL elements, TED, Cftl, 
and Ulysses— were aligned and analyzed by using parsimony and bootstrapping 
methods, with plant caulimoviruses and/or retroviruses as outgroups. Clades sup- 
ported at the 95% level after bootstrapping include ( 1 ) 17.6 with 297 and (2) all 
of the SURL elements together. Other likely relationships supported at lower boots- 
trap confidence intervals include (1 ) SURL elements with mag, (2)17.6 and 297 
with TED, and this collective group with 412 and gypsy, (3) Tfl with Cftl, (4) 
1FG7 with Del, and (5) all of the retrotransposons in the gypsy group together, to 
the exclusion of Ty3. In contrast with an earlier analysis, our results place mag 
within the gypsy group rather than outside of a cluster that contains gypsy group 
retrotransposons and plant caulimoviruses. Several features of retrotransposon ge- 
nomes provide further support for some of the aforementioned relationships. The 
union of SURL elements with mag is supported by the presence of two RNA 
binding sites in the nucleocapsid protein. Location of the tRNA primer binding 
site and the presence of a long open reading frame 3' to the /w/gene support the 
/ 7. 6-297- TED-412-gypsy cluster. 



Introduction 



Retrotransposons containing long terminal repeats ( LTRs ) have now been iden- 
tified in the genomes of a number of organisms and can be divided into two groups 
on the basis of both phylogenetic analysis of amino acid sequences and structural 
features of the genome (Xiong and Eickbush 1988, 1990; Doolittle et al. 1989). In 
the copia group, with representatives from Drosophila ( copia and 1 731 ) , yeast ( Tyl ) , 
plants ( Tntl, Tal-3, Tstl, Wis, and Bis), and Physarum (Tpl), the integrase gene is 
located between the protease and reverse transcriptase genes. In the gypsy group, with 
representatives from insects (gypsy, 412, 17.6, 297, mag, micropia, and Ulysses), 'yeast 
{Ty3 and Tfl), filamentous fungi {Cftl), echinoids {SURL elements), and plants 
{IFG7 and Del), the integrase gene is located 3' to the RNase H gene. The gypsy 
group of LTR retrotransposons is related to plant caulimoviruses and to retroviruses, 
on the basis of reverse transcriptase sequences (Xiong and Eickbush 1990). 

1 . Key words: retrotransposon, reverse transcriptase, RNase H. 
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Here, we examine phylogenetic relationships among members of the gypsy group 
by using amino acid sequences from the reverse transcriptase and RNase H proteins. 
Previous phylogenetic analyses of the gypsy group include Doolittle et al. (1989) and 
Xiong and Eickbush (1990). Xiong and Eickbush (1990) included 10 elements from 
the gypsy group in their analysis of reverse transcriptase sequences. Since that time, 
sequences for SURL elements, TED, Tfl, Cftl, and Ulysses have become available' 
We also evaluate the distribution and evolution of structural features in these retro- 
transposons in the light of amino acid-based phylogenies. Several structural features 
corroborate phylogenetic analysis on the basis of amino acid sequences. 

Methods 

Amino acid sequences and features of retrotransposons were obtained from 
GenBank and from references given in figure 1 . Sequences of representative plant 
caulimoviruses were also obtained from GenBank. Delineation of boundaries for the 
reverse transcriptase protein correspond to that used by Xiong and Eickbush (1990). 
Delineation of RNase H sequence boundaries roughly corresponds to the region iden- 
tified by McClure ( 199 1 ). Multiple alignments were made by using CLUSTAL (Higgins 
and Sharp 1988), and adjustments were made by eye when conserved residues defined 
in Xiong and Eickbush (1990) and McClure (1991) were not aligned. Maximum 
parsimony and bootstrapping were performed by using PAUP, version 3.0s (Swofford 
1991 ), with gaps counted as missing data. Plant caulimoviruses and/ or retroviruses 
were used as outgroups. Each step on a parsimony tree corresponds to a single amino 
acid replacement. Because exact methods of finding minimum-length trees could not 
be used for the complete set of sequences, a heuristic approach using 100 replications 
with random input orders was employed. We also used a starting tree consistent with 
the tree given in Xiong and Eickbush ( 1990) as a baseline for searching for shorter 
trees. A distance matrix based on the aligned amino acid sequences was constructed 
by using the Kimura ( 1983, p. 175) option of the PROTDIST program on PHYLIP 
and was analyzed by using the neighbor-joining method (Saitou and Nei 1987). 

Results 

Alignments 

Figure 1 shows a multiple alignment of amino acid sequences from the reverse 
transcriptase region. Overall, this alignment is similar to that of Xiong and Eickbush 
( 1 990 ) , and most of the conserved blocks in their alignment are retained in the present 
alignment. Figure 2 shows an alignment of sequences from the RNase H region. 

Phylogenetic Trees 

Two minimum-length trees containing 2,2 19 amino acid replacements were found 
for the combined reverse transcriptase/ RNase H sequences. One of these trees, rooted 
by using plant caulimoviruses, is shown in figure 3. On the second tree (not shown) 
the Tfl-Cftl group and IFG7-Del groups switch positions, and micropia is closer to 
Ulysses than to the SURL-mag group. Also shown on the tree in figure 3 are the 
consensus results of 500 bootstrap replications. Results summarized in figure 3 show 
( 1 ) a likely sister-group relationship (86%) of TED (from the cabbage looper Tricho- 
plusia ni) with 17.6 plus 297 (from Drosophila), (2) a likely sister-group relationship 
(73%) between the plant retrotransposons 1FG7 and Del, (3) a likely sister-group 
relationship (80%) between SURL elements and mag, (4) a likely sister-group rela- 
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-Alignment of amino acid sequences from the RNase H region for the gypsy group of retro- 
nd several plant caulimoviruses. Abbreviations are given in fig. 1. An asterisk (*) denotes a 



tionship (81%) between Tfl (from fission yeast) and Cftl (from the fungal tomato 
pathogen Cladosporiumfulvum), and (5) a possible clade (62%) containing 77.6, 297, 
TED, 412, and gypsy. In addition, Ty3 is an outgroup to all other retrotransposons 
in the gypsy group on 70% of the bootstrap trees. Ulysses and micropia group with 
SURL elements and mag on both minimum-length trees, but this association does 
not hold up after bootstrapping. Likewise, the minimum-length tree shown in figure 
3 supports a clade containing all of the retrotransposons that occur in metazoans, but 
this branch does not occur on the second minimum-length tree, nor is it supported 
by bootstrapping. In contrast to the tree in figure 3, the shortest tree consistent with 
that of Xiong and Eickbush (1990) is 30 steps longer, at 2,249 steps. 

When we converted our sequence alignments to distances by using the Kimura 
option of PROTDIST (PHYLIP, version 3.5; Felsenstein 1993) and then employed 
the neighbor-joining method, the resulting tree (not shown) showed some differences 
from the minimum-length trees, but all of the branches that are supported at the 50% 
level in figure 3 are also supported on the neighbor-joining tree. 

Minimum-length trees (not shown) based on reverse transcriptase versus RNase 
H sequences exhibit several conflicts; for example, SURL elements cluster with mag 
on reverse transcriptase trees but cluster with the two gypsy elements on RNase H 
trees. However, all of the conflicts involve branches that are not supported after boots- 
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trapping. Bootstrapping the reverse transcriptases and RNase H sequences, respectively, 
provides support for the following: 17.6 plus 297 (94% and 97%), and for this group 
with TED (85% and 64%); the two gypsy elements together ( 100% and 100%); the 
three SURL elements together ( 100% and 100%) with SURL (Sp) and SURL (Tg) 
as nearest neighbors (96% and 90%); and Tfl plus Cftl (60% and 73%). In addition, 
bootstrapping reverse transcriptase sequences provides support for SURL elements 
with mag (60%), IFG7 plus Del (71%), and all of the retrotransposons together, 
except Ty3 (52%). 

Features of gypsy-like Elements 

Table 1 summarizes the distribution of seven different features of retrotransposons 
in the gypsy group. The phylogenetic significance of these features is discussed below. 

Discussion 

Xiong and Eickbush (1988, 1990) previously examined relationships among re- 
troid elements, including retrotransposons in the gypsy group, on the basis of reverse 
transcriptase sequences. One of the differences on the Xiong and Eickbush (1990) 
tree is that mag is outside of a cluster containing other retrotransposons in the gypsy 
group as well as plant caulimoviruses. To test this hypothesis with our data, it was 
necessary to include retroviruses as an outgroup to the collective group. We limited 
this analysis to reverse transcriptase sequences because of the difficulty in aligning 
RNase H sequences. Retroviruses clearly root the tree (not shown) such that the plant- 
caulimo virus and retrotransposon groups (including mag) are each monophyletic. 

Two other differences on the Xiong and Eickbush (1990) tree are as follows: ( 1 ) 
Ty3 is not peripheral to other gypsy retrotransposons but occupies a position close to 
IFG7 and Del, and (2) 412 is the most peripheral member of the gypsy cluster, except 
mag. Whether we ( 1 ) use parsimony or neighbor-joining methods, (2 ) include RNase 
H and reverse transcriptase or just reverse transcriptase sequences, or (3) restrict our 
analysis to the reverse transcriptase sequences available to Xiong and Eickbush (1990), 
Ty3 occupies the most peripheral position among retrotransposons in the gypsy group, 
and 412 clusters with the insect elements gypsy, 297, 17.6, and TED. 

The overall congruence between reverse transcriptase and RNase H bootstrap 
trees indicates that a similar phylogenetic signal is present in both, although, when 
taken separately, each of these proteins provides less resolution than they do in com- 
bination with each other. One of the implications of the overall congruence between 
bootstrap trees is that reverse transcriptase and RNase H have similar evolutionary 
histories without any interelement recombination that might cause striking differences. 

If Ty3 is taken as an outgroup to all of the other retrotransposons, then the 
implied primitive character states for the characters in table 1 are + 1 ribosomal frame- 
shifting, one RNA binding site, tRNA methionine, a +2 location of the tRNA primer 
binding site (PBS), and lack of a long open reading frame (ORF) 3' to the pol gene. 
On the basis of these designations of primitive character states, several of the aspects 
of genome structure given in table 1 offer additional support for some of the branches 
on the tree in figure 3. First, 17.6, 297, and TED are united by the putative shared 
derived character of tRNA serine, although a putative tRNA serine also occurs in 
Cftl (McHale et al. 1992). Second, 77.6, 297, TED, 412, and gypsy share a number 
of putative derived characters, including a long ORF 3' to the pol gene, a 1 -bp overlap 
of the 5' LTR and the tRNA PBS, and -I frameshifting of the pol gene relative to 
the gag gene, as well as the absence of RNA binding sites in the nucleocapsid protein. 
While two of these derived characters have evolved elsewhere on the tree (i.e., -1 
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frameshifting also occurs in Cftl and mag, and RNA binding sites are absent in 
Ulysses), the presence of a long ORF 3' to pol and a -1 location of the tRNA PBS 
are unique to this subset of the gypsy group. Third, the putative relationship between 
mag, SURL elements, and possibly micropia is potentially strengthened by the exclusive 
occurrence of two RNA binding sites in the nucleocapsid protein in all of these ele- 
ments. Most retroviruses also possess two RNA binding sites, but in the somewhat 
more closely related plant caulimoviruses there is only a single site. Further support 
for the alliance between mag and SURL elements comes from the observation that 
the number of amino acids separating the two RNA binding sites is identical in these 
elements. Micropia, in turn, has 14 additional amino acids that separate the first and 
second RNA binding sites. The plant elements Del and IFG7 share a number of 
features, such as a single RNA binding site, a single ORF containing the gag and pol 
genes, and a tRNA methionine PBS, but these features appear primitive on the basis 
of their occurrence in Ty3. 

The long LTRs in Ulysses and Del appear homoplastic on the basis of other 
evidence discussed above, whereas the short LTRs in mag are unique to this element. 
Element length ranges from 4,564 bp in mag to 10,653 bp in Ulysses and reflects the 
differences in LTR length. Among other elements, most of the variation results not 
from differences in LTR length but rather from the additional ORF 3' to the pol gene. 

It is interesting that, for the tree in figure 3, all of the animal retrotransposons 
occur on one branch, whereas the two plant elements occur on a second branch. The 
separate clusters of plant and animal retrotransposons suggest that the host phylogeny 
imposes a distinct signature on the phylogeny of the retrotransposons; Flavell (1992) 
previously noted predominantly plant and animal groups for the copia group of re- 
trotransposons as well. Flavell ( 1992) has also characterized the copia group as lacking 
ribosomal frameshifting, whereas in the gypsy group the gag and pol genes are always 
overlapping. However, the presence or absence of overlapping gag and pol genes is 
shown here to exhibit more variation in the gypsy group than was previously recognized. 

In conclusion, our understanding of the phylogeny of the gypsy group of retro- 
transposons is enhanced by considering not only amino acid sequences but also genetic 
features of these elements. Some features (e.g., long 3' ORF) show little or no hom- 
oplasy, whereas others (e.g., type of tRNA PBS) are labile and show much more 
homoplasy. 
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Reverse Transcriptase Motifs 
in the Catalytic Subunit 
of Telomerase 

Joachim Lingner,* Timothy R. Hughes, Andrej Shevchenko, 
Matthias Mann.t Victoria Lundblad.t Thomas R. Cecht 

Telomerase is a ribonucleoprotein enzyme essential for the replication of chromosome 
termini in most eukaryotes. Telomerase RNA components have been identified from 
many organisms, but no protein component has been demonstrated to catalyze telo- 
meric DNA extension. Telomerase was purified from Euplotes aediculatus, a ciliated 
protozoan, and one of its proteins was partially sequenced by nanoelectrospray tandem 
mass spectrometry. Cloning and sequence analysis of the corresponding gene revealed 
that this 123-kilodalton protein (p123) contains reverse transcriptase motifs. A yeast 
{Saccharomyces cerevisiae) homolog was found and subsequently identified as EST2 
(ever shorter telomeres), deletion of which had independently been shown to produce 
telomere defects. Introduction of single amino acid substitutions within the reverse 
transcriptase motifs of Est2 protein led to telomere shortening and senescence in yeast, 
indicating that these motifs are important for catalysis of telomere elongation in vivo. In 
vitro telomeric DNA extension occurred with extracts from wild-type yeast but not from 
esf2 mutants or mutants deficient in telomerase RNA. Thus, the reverse transcriptase 
protein fold, previously known to be involved in retroviral replication and retrotranspo- 
sition, is essential for normal chromosome telomere replication in diverse eukaryotes. 



Repl ication of chromosome ends, or telo- 
meres, requires specialized factors that are 
not essential for replication of internal 
chromosome sequences. Conventional 
DNA polymerases cannot fully replicate 
blunt-ended DNA molecules (1) or eukary- 
otic chromosomes (2), which contain 3'- 
terminal extensions. The key to end repli- 
cation is telomerase, a ribonucleoprotein 
(RNP) enzyme that synthesizes the telo- 
meric DNA repeats (3). The template for 
telomeric repeat synthesis is provided by 
the RNA subunit, which has been identi- 
fied, cloned, and sequenced in ciliated pro- 
tozoa (4, 5), yeast (6, 7), and mammals (8). 

A telomerase RNP was first purified 
from Tetrahymena (9). Two protein compo- 
nents, p80 and p95, were specifically asso- 
ciated with the RNA subunit. Human, 
mouse, and rat homologs of Tetrahymena 
p80 have since been identified and found to 
be associated with telomerase (JO). Al- 
though this evolutionary conservation sug- 
gests that p80 and p95 have important roles 
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in telomere replication, their specific func- 
tions remain unclear. Neither protein has 
been reported to be essential for telomere 
synthesis, and neither has significant simi- 
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larity to known polymerases or reverse tran- 
scriptases (II). 

Telomerase RNP has also been purified 
trom Euplows Licdicul.uui, a hypotnehous cil- 
iate only distantly related to Tetrahymena 
(12). The hypotrichs present a special op- 
portunity for telomere studies because their 
macronuclei contain millions of gene-sized 
DNA molecules. Each cell has about 8 X 
10 7 telomeres (13) and about 3 X 10 s mol- 
ecules of telomerase (J2). Measurements of 
the specific activity of telomerase through- 
out the purification indicated that the ma- 
jor activity present in macronuclear ex- 
tracts was purified {12). The active telo- 
merase complex had a molecular mass of 
-230 kD, corresponding to a 66-kD RNA 
subunit and two proteins of 123 kD and 
~43 kD (12). Photocross-linking experi- 
ments implicated the larger protein in spe- t 
cific binding of the telomeric DNA sub- c 
strate (14). ' 

Here we characterize the pi 23 compo- £ 
nent of Euplotes telomerase and show that it ; 
contains sequence hallmarks of reverse tran- ! 
scriptases. Furthermore, it is the homolog of j 
a yeast protein, Est2p, shown previously to L ( 
function in telomere maintenance. Our ge- s 
netic and biochemical analyses show that ' 
the reverse transcriptase motifs of Est2p are | 
essential for telomeric DNA synthesis in < 
vivo and in vitro. We propose that telo- \ 
frequently called "a specialized re- < 



:e transcriptase," 



n fact 




Fig. 1 . Sequencing of the p1 23 subunit of telomerase by nanoelectrospray tandem mass spectrometry 
(A) Mass spectrum of the unseparated peptide mixture. All peptides that were sequenced completely or 
partially are marked by the letter T or t, respectively (75). The eight peptide ions from which sequence 
tags were generated are marked by filled circles. Most unlabeled peaks correspond to trypsin autolysis 
products. (B) Tandem mass spectrum of the doubly charged precursor at the mass-to-charqe ratio 
(m/z) of 830.4 m (A). Interpretation of the fragment ion mass in (B) and comparison with the esterified 
form of the peptide allowed the sequence assignment. 
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scriptase in terms of its catalytic active si 
Determination of Euplotes pi 23 i 



t. The, 



icodin 



protein subunits from E. aediculatus were 
isolated by reverse genetics. Telomerase was 
purified and polypeptides were separated on 
SDS-polyacrylamide gels. Amino acid se- 
quencing of the trypsin-digested pi 23 band 
was accomplished by nanoelectrospray tan- 
dem mass spectrometry (15-17), a minia- 
turized form of electrospray (18) that allows 
mass spectrometric interrogation of minute 
analyte volumes for extended periods of 
time due to its low flow rate. No chroma- 
tography is needed, because the unfraction- 
ated peptide mixture obtained after diges- 
tion of the protein in a gel slice is separated 
and sequenced in the spectrometer. For 
pi 23, 14 peptides were sequenced de novo 
(Fig. 1) (15). 

Two of the peptide sequences were used 
to design degenerate polymerase chain re- 
action (PCR) primers (arrows in Fig. 2) to 
amplify a portion of the macronuclear 
gene encoding pi 23. A genomic library 
was prepared from macronuclear DNA 
and screened with this fragment to isolate 
the full-length gene {19). The pl23 gene 
was found to be encoded by a 3 2 79- base 
pair (bp) macronuclear chromosome con- 
taining an uninterrupted 1031-amino acid 
open reading frame. In a Southern (DNA) 
blot experiment the PCR fragment hybrid- 



ized to a single macronuclear chromosome 
of -3.3 kb (20). The open reading frame 
predicts a protein of 122,562 daltons, cor- 
responding to the size estimated by SDS- 
polyacrylamide gel electrophoresis of puri- 
fied protein [120 kD (12)]. More than 150 
amino acids identified in the purified 
polypeptide by mass spectrometry could be 
assigned in the open reading frame (Fig. 
2). This includes all 14 peptides that were 
completely sequenced. The tandem mass 
spectra of 10 additional peptides also 




Non-LTR-retroposon: 



matched the gene sequence through par- 
tial^ sequences or peptide sequence tags 

Reverse transcriptase motifs in Eu- 
plotes pi 23 and its yeast homolog Est2p. 

In a BLAST search of protein databases, 
Euplotes pi 23 was found to be most similar 
to Saccharumyces cerevisiae Est2p ( P = 7 X 
10~~ 7 ) and to a group II intron- encoded 
reverse transcriptase from the cyanobacte- 
rium Calothrix (P = 2 X 10" 4 ) (22, 23). 
Yeast Est2p has a predicted molecular 



Fig. 3. Block diagrams of 
p123 and Est2p and com- 
parison of the reverse tran- 
scriptase (RT) domains with 
those of other reverse tran- 
scriptases. The spacing of 
sequence motifs (red) is di- 
agnostic for each reverse 
transcriptase family (27). In 
the consensus sequence, 
abbreviations are as in Fig. 
2. The isoelectric point (pi) is 
the pH at which the protein 
has no net charge. 



■(COOH) p/= 10.0 



P Ui KlARN_DVNNSLFCHSANV!ArTLLKGAAWKMPHSLVOTYAFVOLLINTTVIQFNGQ . FFTQIVGNRC . NEPHLPPKWVORSSSS.^A rAAQ K . QLTEP\ 

E 5 ffpyski^pssssikkltdlreaifpt..nlvkipC5I.kvh ex: LXRiiKi.-i.Nvvsn.Ns .<•!-, :/ ' -„. 

f a l 12 l Hi ks kvyeei,i ■-.t\- -r- v*ir.-,. 

Sc Est2p 278 SPKERVLKFIIVILQKLLPQEHFGSKKOT HI L , Lut , 

Ea pl23 454 CFFYVTEQQFS >SKT YYRKHIWrffllWEHSIADLKK ETLABVOCK EVEEl } Blip* a 

Sc E3t2p 379 TFFYCTE . ISSTVTIVYFRHDTWNKLITPFIVE iFKTYLVEN?TVCRNH L * tlRIIPJH I I FRI 

Sc Est2p 489 RPTSFTKIYSPTnT ^ T r KNNIVIDSKNFR 

Sc Est2p 489 RPTSFTKIYSPTQIADRIKEFKQRLLKKr J L % L 

Motif A (3) h m . 
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Motif B' (4! Motif C (5) 
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Motif D (6) 
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Est2p 8. 



id in boldface. The PCR primers used quenced by nanoelectrospray 
to amplify a portion of the gene are indicated by the arrows. Assigned reverse below the p1 23 sequence indicate 



acid. The underlined sequences in p123 are the 14 peptides completely se- 
spectrometry. The dashed lines 



transcriptase motifs [designated by letters (28) or alternatively by numbers in spectra matched the sequence. One of the peptides c^Ta^ed an aclTaLd 

parentheses (27)] are shown ,n orange, with the most highly conserved amino methionine (solid triangle) at its NH 2 -terminus if was the NH 

of the motrfs, h designates a hydro- terminal peptide of the protein. The nucleotide sequence of the Euplotes p1 23 

m gene has been deposited in GenBank (accession number U95964). 



ar amino acid, and + a positively charged ai 
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mass of 103 kD and, like pi 23, is very 
basic (Fig. 3). Although the overall se- 
quence identity of Euplotes pi 23 and yeast 
Est2p is only 20% (Fig. 2), sequence sim- 
ilarity (correspondence of acidic, basic, 
hydrophobic, and hydrophilic amino ac- 
ids) can be detected over the entire length 



of the two proteins. 

The EST2 (ever shorter telomeres) gene 
was one of four complementation groups 
identified by screening yeast mutants for 
reduction in telomere length and senes- 
cence (24, 25). Epistasis analysis had indi- 
cated that the four EST genes function in 



the same pathway as TLC1, the gene en- 
coding the teiomcrase RNA subunit (6), 
suggesting that the EST genes encode ei- 
ther components of the telomerase or pos- 
itive regulators of its activity. The homol- 
ogy of yeast Est2p with Euplotes pi 23, the 
latter isolated because of its physical asso- 




e transcriptase 

motifs in Est2p. (A) The 12 amino acids changed to alanines 
within the reverse transcriptase domain of Est2p are indicated by 
downward arrows (red, telomerase-conserved residues; black, 
nonconserved residues). The phenotypic effects of 
tions are indicated by solid triangles (strong mutant phenotype) 
and open triangles (weak mutant phenotype). The sequence 
alignment includes members of three other reverse transcriptase 
families (27). Boldface residues indicate identity of at least two 
sequences in the alignment. See (50) for amino acid abbrevia- 
tions. (B) Senescence phenotype of esf2 mutants shown by 
spreading single colonies on plates (57). Photographs were tak- 
en after -75 generations of growth. (C) Telomere length of esf2 
mutants. Southern blot of genomic yeast DNA, hybridized 
with a telomere-specific probe (24). Single-copy plasmids 
carrying the wild-type ES72 gene (lanes 2 and 15), the indi- 
cated esf2 mutant genes (lanes 3 to 14), or empty 
(lane 1) were transformed into an esf2-A strain. Genomi 
DNA was prepared after -75 generations of growth, at 
time of maximal senescence for an esf2 null 
bracket and four small arrows indicate telomeric bands, and 
the two larger arrows indicate the subtelomeric repeat frag- 
ments that are amplified late in the growth of est2 
strains (24, 28). Five independent transformants of each mis- 
sense mutant were assayed, one of which is shown. (D) 
Dominant-negative effect, resulting from overexpression of 
certain Est2p mutants. A Southern blot of genomic yeast 
DNA, prepared from a wild-type EST2+ strain transformed 
with a high-copy plasmid expressing wild-type or the indicat- 
ed mutant esf2 genes, was developed as in (C). In each 
the £ST2 promoter was replaced with the constitutive pro- 
moter of the alcohol dehydrogenase (ADH) gene. Lanes 1 
and 16, empty vector. Two transformants are shown for 
each mutation after -50 generations of growth. Additional 
growth resulted in further telomere shortening, although this 
additional length reduction is not sufficient to confer a senes- 
cence phenotype (52). 



Overexpression in EST2* o 
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ciation with telomerase RNA and its co- 
purification with telomerase activity, sup- 
ported the proposal that both proteins 
il mit of their respective 

Euplotes pl23 contains reverse transcrip- 
tase motifs, and the alignment reveals the 
presence of these motifs in a similar region 
of Est2p (Fig. 3). The primary sequences of 
reverse transcriptases are highly divergent: 
Only a few amino acids are absolutely con- 
served within separate short motifs (26, 
27), but these motifs are believed to form a 
common tertiary fold. Both pi 23 and Est2p 
contain these key conserved amino acids, 
most notably the three invariant aspartates 
in motifs A and C, which are thought to be 
directly involved in catalysis (Fig. 2). Con- 
served motifs are spaced differently in the 
two major branches of reverse transcrip- 
tases, those encoded by retroviruses and 
long terminal repeat (LTR) retroposons 
and those encoded by non-LTR retro- 
posons and group II introns (27). The 
spacing of sequence motifs in pi 23 and 
Est2p resembles that in the latter branch. 
However, the interval between motifs A 
and B' in pl23 and Est2p is unusually 
large (Fig. 3), suggesting that these two 
polypeptides may be members of a previ- 
ously unknown subcategory. 

Requirement of the reverse transcrip- 
tase motifs for Est2p function in vivo. The 
presence of reverse transcriptase motifs in 
both pi 23 and Est2p suggests that this region 



may define the catalytic active site of telo- 
merase. To test the importance of these mo- 
tifs for Est2p function, we used site-directed 
mutagenesis to change conserved and non- 
conserved aspartic acid (D) and glutamine 
(Q) residues in and around motifs A, B', and 
C to alanine (A) (Fig. 4A). Each mutant, 
present on a single-copy ARS CEN plasmid, 
was tested for in vivo function in a comple- 
mentation assay. Plasmids were transformed 
into the est2-A strain (A designates dele- 
tion), in parallel with either the empty vec- 
tor or an EST2 + plasmid. Transformants 
were assessed for the senescence phenotype 
(Fig. 4B) and for chromosome telomere 
length (Fig. 4C). 

Consistent with the prediction that the 
reverse transcriptase motifs are required for 
Est2p function, mutation of any of the three 
conserved aspartates in motifs A and C 
prevented normal telomerase activity. 
Transformants expressing these mutant pro- 
teins became senescent and had shortened 
telomeric tracts, phenotypes indistinguish- 
able from those of the null mutant (Fig. 4, B 
and C). Furthermore, a bypass pathway for 
telomere maintenance (28) was evident in 
these three mutant strains. Activation of 
this alternative pathway occurs as the result 
of a global amplification and rearrangement 
of both telomeric G-rich repeats and subte- 
lomeric regions, and has only been observed 
in est and tic) mutant strains with a severe 
telomere shortening phenotype (24, 29). A 
feature of this pathway is the amplification 



of two subtelomeric bands (Fig. 4C); these 
diagnostic restriction fragments were sub- 
stantially amplified only in the est2 null 
mutant and the three proposed active site 

Mutations of amino acids other than the 
three most conserved aspartates had less 
severe or no phenotypic effects. The residue 
Asp" 6 of motif A is conserved between 
Est2p and pi 23, and the D536A mutation 
(Asp mutated to Ala at position 536) 
caused substantial telomere shortening and 
a modest senescence phenotype. Of the 
conserved residues tested, Gin 632 of motif 
B' was the only one that was functionally 
insensitive to replacement with alanine. 
However, this glutamine is not strictly con- 
served in reverse transcriptases (27), and 
when it is changed to alanine in human 
immunodeficiency virus- 1 (HIV-1) reverse 
transcriptase, polymerase activity in vitro is § 
reduced but not completely eliminated ^ 
(30). In contrast to the phenotypes seen £j 
upon mutation of the semiconserved amino 
acids, mutation of six of the seven noncon- 2> 
served amino acids tested showed little or 
no alteration of Est2p function. £ 

Two observations indicate that stable c 
Est2 protein was produced in the five est ° 
mutants with a diminished capacity to com- £ 
plement the est2-A strain. First, Myc 3 - cn 
epi tope-tagged versions of each mutant pro- g 
tcin were visualized immunologically after g 
immunoprecipitation (31). Second, overex- c 
pression of each of the five mutant alleles in '0 
a wild-type yeast strain with a functional ^ 
chromosomal EST2 + gene resulted in telo- i 
mere shortening (Fig. 4D), whereas overex- % 
pression of the wild-type EST2 gene had £ 
little effect. The dominant-negative pheno- J? 
type shows that each mutant protein is "o 
being made and suggests that excess mutant T> 
Est2p can titrate components away from the o 
wild-type telomerase complex. g 

Requirement of Est2p for telomerase o 
activity in vitro. If Est2p is the catalytic Q 
protein subunit of telomerase, then telo- 
merase activity should be abolished in est2 

veloped with extracts fractionated by glyc- 
erol gradient centrifugation (32). Telomer- 
ase-containing fractions were identified by 
detection of the RNA subunit on Northern 
blots (Fig. 5). Yeast telomerase sedimented 
as a 19S to 20S particle, substantially faster 
than the sedimentation of the deprotein- 
ized telomerase RNA (~17S). Telomerase- 
containing glycerol gradient fractions were 
pooled, concentrated, and tested for the 
ability to elongate a single-stranded telo- 
meric oligonucleotide (Fig. 6). An activity 
was detected in wild-type extracts that had 
the characteristics of telomerase. It was de- 
pendent on the presence of oligonucleotide 
substrate and fractionated extract (Fig. 6A, 



Fig. 5. Sedimentation of 



as fractionated o 
a glycerol gradient (32), 
and telomerase RNA was 
detected by Northern 
blotting (bottom) and its 
concentration quantified 
on a Phospholmager 



snRNP s< 



f U1 

J. Fractions 
tivity assays 



'. RNA sedi- 
t -17S. The 



7.6S (alcohol dehydroge- 
nase), 11. 3S (catalase). 
1 7.3S (apofemtin), and 
19.3S(thyroglobulin). 




Fraction number 



564 
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lanes 1 to 3). Addition of T and G residues 
occurred in an ordered manner consistent 
with the expected alignment of substrate 
and RNA template (Fig. 6A, lanes 5 and 6). 
The activity was sensitive to low concen- 
trations of ribonuclease (RNase) A and was 
not stimulated by adenosine triphosphate 
(ATP) (Fig. 6B). These characteristics, in 
addition to the observed single round of 
extension of primer (Fig. 6A), are similar to 
those of the telomerase activity described by 
Blackburn and co-workers (33). A different 
activity described as telomerase by Lite and 
Wang (34) gives rise to long products and is 
stimulated by ATP. This latter activity was 
not detectable in our telomerase-containing 
glycerol gradient fractions. 

A telomerase RNA template mutation 
that alters the specificity of nucleotide in- 
corporation to produce a Hae III restriction 
site (6) provides an additional test for the 
authenticity of the in vitro telomerase as- 
say. An extract of this TLCl-l(HaeUl) mu- 
tant, fractionated on a glycerol gradient, 
gave the predicted extension of the telo- 
meric oligonucleotide only in the presence 
of deoxycytidine triphosphate (dCTP) (Fig. 
6C, lanes 6 to 8), a nucleotide that has no 
effect on extension by a wild-type extract 
(Fig. 6C, lanes 2 and 3). This nucleotide 
specificity change supports the dependence 
of the assay on the TLCi RNA. Because 
the TLCl-J(Haelll) strain also undergoes 
senescence (29), this result also provides 
confidence that telomerase activity can still 
be detected in senescing cells, as long as 
they are not subcultured too extensively. 

We then assayed fractionated extracts 
from est2-A and tlci-A strains for telomerase 
activity (Fig. 6D). As expected, no activity 
was detectable in tic J -A yeast, which has 
the gene for telomerase RNA deleted. In 
the e.«2-A strain, telomerase RNA was still 
assembled into an RNP, as assessed by glyc- 
erol gradient centrifugation and Northern 
blotting (32), but telomerase activity was 
completely absent. This indicates that 
Est2p is essential for telomerase activity. As 
described above, the absence of activity is 
not simply a secondary consequence of se- 
nescence. We also measured telomerase ac- 
tivity in extracts from est2-A and strains 
expressing two of the proposed active site 
mutants in the presence of the chain-termi- 
nating analog ddGTP (Fig. 6E). According 
to the proposed primer-template alignment, 
extension should terminate after addition of 
two nucleotides. A practical advantage is 
the higher signal-to-noise ratio obtained 
when all products are concentrated in one 
or two bands. Again, activity was depen- 
dent on functional TLCI and EST2 genes. 

Telomerase structure. The presence of a 
reverse transcriptase domain in the catalytic 
subunit of telomerase provides a framework 



for exploring the structure and mechanism of 
tins enzyme. Reverse transcriptases have 
been studied in great detail, and the three- 

scriptase has been solved (35). The structure 
can be compared with a right hand with 
fingers, palm, and thumb, with the active 
site residing in the palm (36). A model for 
telomerase structure based on that of HIV- 1 
reverse transcriptase (HIV-1 RT) is shown 
in Fig. 7 with the telomerase RNA and a 
telomeric DNA substrate superimposed. 

The catalytic subunit of telomerase has 
several features that distinguish it from oth- 
er reverse transcriptases. Telomerase uses 
only a small portion of its RNA subunit as 
a template. The borders of this template 



must somehow be recognized. Furthermore, 
during processive synthesis of telomeric re- 
peats the substrate translocates from one 
end of the template to the other by an as yet 
unknown mechanism. The large gap be- 
tween motifs A and B' of telomerase pi 23 
and Est2p indicates an unusual finger do- 
main structure. In HIV-1 RT this domain 
may be involved in template strand binding 
(35, 36); whether and how it contributes to 

lomerase RNP remain to be investigated. 
Finally, the telomerase protein is stably as- 
sociated with its RNA subunit, as shown by 
our isolation of the Euplotes p!23-RNA 
complex and by coimmunoprecipitation of 
the yeast RNA subunit with Est2p (31). 



^ TLC1 RNA ^ 



WT est2-A 




B ATP RNase A 




5 6 7 8 S 



'~ 1 r\ 

234 5 6789 
WT I TLC1-1(Hae III) 



Fig. 6. In vitro functional analysis of 
reverse transcriptase motifs in [ 
Est2p. Telomerase was partially pu- < 
rified by glycerol gradient centrifu- 
gation and assayed for the ability to £ 
extend a telomeric DNA substrate : 
(32). In the assay [a 32 P]dTTP was \ 
included to visualize products elon- j 
gated by 1 , 2, 3, or 4 nucleotides (+ 1 , 1 
+ 2, +3, or +4). (A) The telomerase 1 
RNA template region maximally I 
base-paired to the DNA substrate is : 
indicated schematically. Product ! 
lengths were determined relative to ' 
the same DNA substrate extended by j 
one nucleotide at its 3' end by reac- 1 
tion with [oi 33 P]ddTTP and terminal ! 
deoxynucleotidyl transferase (lane 4). -j 
i nucleotides were added < 
in the presence of dGTP and dTTP ; 
(lane 3), one nucleotide in the pres- | 

nucleotides in the presence of dTTP j 
and the chain-terminating analog J 
ddGTP (lane 6). Oligo, oligonucleo- ] 
tide. (B) Effect of RNase A and ATP on telomerase activity. Standard i 
reaction (lane 1), standard reaction plus 1 mM ATP and 1 mM ! 
additional MgCI;, (lane 2), and standard reaction plus RNase A at 0.1 ! 
ng/(j.l (lane 3), 1 ng/pj (lane 4), and 1 0 ng/rxl (lane 5). (C) Specificity of ; 
nucleotide incorporation dictated by the RNA template sequence. C 
Product lengths were determined relative to DNA markers that had 
been extended by [a 3J P]ddTTP (lane 1) or [« 32 P]ddCTP (lanes 5 
and 9) as in (A). Note that these two markers had slightly different 
mobilities on the polyacrylamide gel. The mutant TLC1-1(Haelll) 
telomerase RNA template is indicated with the substrate bound in 
the most stable register. Consistent with this alignment and the 
mutated template sequence, efficient extension required the pres- 
ence of dCTP (lanes 6 and 7). Telomerase in extract from TLC1-wt 
cells ( WT ) [see (A) for template sequence] was not influenced by the 
presence of dCTP (lanes 2 and 3). (D) Requirement of functional 
£572 and TLCI products for telomerase activity. Fractionated ex- 
tracts from wild-type (lanes 1 and 2) and the indicated mutant strains 
(lanes 3 to 6) (32) were tested at two extract concentrations. Reac- 
of telomerase fraction, and 
reactions 2, 4, and 6 contained 20%. (E) Alleviation of telomerase 
activity by active site mutations in Est2p. All assays included the 
chain terminator ddGTP (100 |j.M). The reactions contained 10% 
(v/v) of telomerase fraction (lanes 1 to 6) or 5% of each of the 
indicated fractions (lanes 7 to 9). The results of the mixing experi- 
ment (lanes 7 to 9) indicate that the absence of activity is not due to 
an inhibitor in the mutant extracts. 
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This last featut ii i i ;ui hies celomerase 
from the retroviral and LTR retroposon re- 
verse transcriptases, but is similar to some 
mitochondrial and group II intron- encoded 
reverse transcriptases that also form com- 
plexes with their RNA templates (37). 

Reverse transcriptase essential for 
chromosome replication in diverse eu- 
karyotes. Reverse transcriptases have not 
previously been considered essential for 
normal cell physiology. Initially discovered 
as retroviral enzymes that catalyze the de- 
fining RNA-to-DNA step of retroviral rep- 
lication (38), they were later found to me- 
diate the transposition of DNA elements 
within eukaryotic genomes through an 
RNA intermediate (39). Reverse transcrip- 
tases are also present in some prokaryotes 
(40) and in Neurospora mitochondria (41), 
where they replicate genetic elements that 
are nonessential to their "host." Our discov- 
ery that a structurally related enzyme is 
essential for chromosome replication and 
cell division provides another example of 
the opportunism of nature: once a useful 
protein motif is stumbled upon, natural se- 
lection promotes its exploitation in diverse 

The evolutionary relationship between 
telomerase and the other reverse transcrip- 
tases is intriguing. It is well established that 
retroviruses acquired oncogenes such as v- 
sre, v-ab/, v-ras, and v-/os from cellular ge- 
nomes. According to Temin's protovirus 
hypothesis, retroviruses also acquired their 
reverse transcriptase gene from normal 
cells, where the enzyme presumably con- 
tributed to some normal cellular process 
(42). Could this cellular source have been 
the telomerase pl23/EST2 gene, which mu- 
tated so that the protein product used an 



exogenous rather than an intrinsic RNA 
template? Alternatively, telomerase and the 

transposons and retroviruses may all be de- 
scendants of an ancestral protein that 
emerged from an "RNA world" (43). 

Telomere replication in the fruit fly Dro- 
sophila has been mysterious because this or- 
ganism does not have short repeated telo- 
meric sequences and presumably no telom- 
erase. Rather, the non-LTR retroposons 
HeT-A and TART cap the chromosome 
ends (44). The TART reverse transcriptase 
is closely related to pi 23 and Est2p, which 
suggests that the Drosaphila telomere repli- 
cation machinery may in fact not be so 
different from that of other eukaryotes (45). 

We have no satisfactory explanation for 
the lack of correspondence between the 
Euplotes and yeast pl23/Est2p proteins and 
the Tetrahytnena p80 or p95 protein (9). 
The small protein subunit of Euplotes te- 
lomerase (p43) also shows no similarity to 
the Tetrahytnena proteins (46), and the 
complete yeast genome sequence does not 
reveal obvious p80 and p95 homologs. 
There are three possible explanations: (i) 
Tetrahymena may have a different telomer- 
ase in which p80 and p95 provide the active 

than once in evolution), (ii) Tetrahymena 
may have two telomerases, one containing 
p80 and p95 and one (unisolated) contain- 
ing a pl23/Est2p homolog (for example, 
one telomerase for de novo telomere forma- 
tion during macronuclear development and 
one for telomere replication), (iii) The Tet- 
rahymena p80-p95-RNA complex may not 
be an active enzyme but may require a 
pl23/Est2p subunit that was underrepre- 
sented upon purification of the particle. 




Mass spectrometric methods have re- 
cently become very successful for the iden- 
tification of proteins whose genes are al- 
ready partially or completely contained in 
sequence databases (47). The sequencing of 
more than 150 amino acids of the pi 23 
telomerase subunit at protein amounts too 
low for chemical methods shows that mass 
spectrometry is now also valuable for se- 
quencing previously unidentified proteins. 

Telomerase activation accompanies the 
immortalization of cultured mammalian 
cells and is also a common property of 
human tumor cells (48). Thus, telomerase is 
considered to be a potential target for the 
development of tumor-specific drugs. Cer- 
tain reverse transcriptase inhibitors devel- 
oped as anti-HIV drugs have already been 
tested against telomerase with some success 
(49). The finding that the telomerase ac- 
tive site is related to that of known reverse 
transcriptases is expected to stimulate such 
efforts. 
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Functional analysis of HIV-1 reverse transcriptase motif C: site- 
directed mutagenesis and metal cation interaction. 



Departamento de Virus y Cancer, Institute Nacional de Salud Publica, 
Cuernavaca, Morelos, Mexico. veronica@ibt.unam.mx 

Motif C, present in all polymerases, has been proposed to be part of the catalytic 
and metal binding site of the enzyme, suggesting that polymerases have a 
common origin. Previously, we have shown that the metal ion manganese induces 
alterations in nucleotide substrate specificity in some polymerases. However, it is 
not known if the active site responsible for incorporation of nonspecific substrates 
is the same as that which incorporates specific ones. Here we show that 
manganese enables HIV-1 reverse transcriptase (RT) to incorporate rNTP's using 
RNA as a template, thus behaving as an RNA replicase. Also, we show that the 
mutation D186H in motif C strongly affects the natural DNA polymerase activity 
and that the RNA replicase activity becomes undetectable, suggesting that both 
activities depend on the same active site. This mutation changes the metal ion 
preference, with mutant RT presenting only 0.5% of the wild-type DNA 
polymerase activity in the presence of magnesium but 1 .6% of the same activity 
in the presence of manganese. This variation in cation preference suggests that 
residue D 186 is part of the metal binding site. Since residue D186 of motif C is 
essential for both activities and appears to be involved in the binding of an 
important cation needed for the specific activity, our results support the idea of a 
common origin for all polymerases, from an ancestral unspecified polymerase 
containing at least motif C. 
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Rapid Identification of All Known Retroviral 
Reverse Transcriptase Sequences with a 
Novel Versatile Detection Assay 
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We have developed a highly sensitive, universal assay that allows detection as well as identification of all 
known retroviral reverse transcriptase (RT)-related nucleic acids in a biological sample by a single two-step 
experiment. The assay combines polymerase chain reaction (PCR) and reverse dot-blot hybridization (RDBH) 
^J" 1 ° f lmmobiHzed s ynthe«c retrovirus-specific oligonucleotides and two sets of mixed oligo primers 

(MOPs). These primers were derived from highly conserved motifs found in all known reverse transcriptase 
genes. The PCR/RDBH assay was used for qualitative analyses of human endogenous retrovirus (HERV) tran 
scnption in peripheral blood mononuclear cells (PBMCs) and in particles released by the human mammary 
carcinoma-derived cell line T47D. Sensitivity was further demonstrated by detection of down to 10 copies of 
pig endogenous retrovirus (PERV) DNA in human cDNA samples. Therefore, this assay is particularly use- 
ful for the identification of retroviral sequences in xenografts as well as in recipients of xenografted tissues 
and organs. Moreover, it is a valuable tool to detect retroviral transcripts and particles in cell cultures used 
for production of therapeutic polypeptides. The assay is further suitable for monitoring vector preparation 
used m human gene therapy to exclude transfer of copackaged endogenous retroviruses into target cells 



INTRODUCTION 

The genomes of all vertebrates contain a wide spectrum 
of endogenous retroviruses (ERVs) and reverse transcrip- 
tase (RT)-related sequences. For example, human ERVs 
(HERVs) are estimated to comprise at least 1-2% of the hu- 
man genome. 1 - 2 Although most of these sequences are assumed 
to be defective, some retain certain biological activities and thus 
represent a reservoir of retroviral genes with pathogenic po- 
tential. Characterization of particles released by the human 
breast cancer-derived cell line T47D revealed that comple- 
mentation between several expressed HERVs can lead to 
pseudotype particles packaging retroviral RNA of different ori- 
gin. 3 - 4 Thus, activation and expression of (H)ERV may result 
in undesired mobilization of genetic material of retroviral ori- 



gin and may interfere with the safe production of therapeutic 
polypeptides, with safe human gene therapy and xenotrans- 

Cross-packaging of ERVs to a high level is observed in 
murine packaging cells commonly used for retroviral vector 
preparation. 6 Cross-packaged ERV transcripts may be trans- 
mitted to recipient cells leading to unwanted integration events, 
or may recombine with the vector forming new infectious retro- 
viruses. This is of high concern for the safety of retrovirus-me- 
diated human gene therapy. The risk of acquiring animal ERVs 
through xenotransplantation also requires attention since 
xenografts from baboons and pigs are currently discussed for 
human use. In vitro experiments indicate that xenotropic ERVs 
such as murine or cat retroviruses can propagate considerably 
in human cells. 7 Baboon endogenous retrovirus (BaEV) read- 
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ily infects human cells in culture. The same holds true for 
porcine ERV and several human cell types. 5 ' 8 - 10 Although these 
viruses do not show a pathogenic effect in their natural hosts, 
the situation may change when they are transferred to im- 
munosuppressed humans, in whom the virus might replicate to 
high titers."' 12 

With respect to prospective practical applications we have 
established a universal detection assay that allows rapid testing 
of biological samples for undesired mobilization of retroviral 
sequences. With this method all known reverse transcriptase- 
related sequences of human and animal origin can be simulta- 
neously identified in a single two-step experiment. 



MATERIALS AND METHODS 

RNA preparation 

Total RNA was extracted from peripheral blood mononu- 
clear cells (PBMCs) of healthy blood donors according to a 
guanidinium isothiocyanate- cesium chloride (GlT/CsCl) ultra- 
centrifugation protocol 13 and dissolved in diethylpyrocarbo nate 
(DEPC)-treated distilled water. Consecutively, mRNA was pu- 
rified with Dynabeads paramagnetic particles as described by 



the manufacturer (Dynal, Hamburg, Germany). Nucleic acid 
concentrations were calculated by spectrophotom etry at 260 
nm. To check for genomic DNA contaminations 50 ng of each 
mRNA preparation was tested in a polymerase chain reaction 
(PCR) with mixed oligonucleotide (oligo) primers (MOPs) 
omitting the reverse transcription step. Only preparations neg- 
ative for DNA traces were used for PCR. Samples positive for 
DNA contamination were treated with RNase-free DNase (100 
units/ug; Roche Diagnostics, Mannheim, Germany) in 100 mM 
sodium acetate (pH 5.0), 5 mM MgS0 4 until control PCR was 
negative. 

Primers and reverse dot-blot oligonucleotides 

For PCR two different mixed oligonucleotide primer (MOP) 
sets, MOP-1 and MOP-2, have been designed. The primer se- 
quences correspond to highly conserved regions present in the 
reverse transcriptase (RT) genes of all known human endoge- 
nous and exogenous retroviruses, as well as related animal retro- 
viruses (Fig. I). 14 " 16 MOP-1 primers preferentially amplify hu- 
man and mammalian type A, B, and D reverse transcriptase 
sequences, whereas MOP-2 primers allow the amplification of 
human and mammalian type C-related RT sequences as well as 
RT sequences of human exogenous retroviruses such as HIV, 




MOP-1 forward gaaggmccaragtnytdychcmrggh 

averse GftAGG^TJXNWDDMKDTYATCMAYRWA 



forward 
reverse 



GAAGGATCC TKKA MM S KVYTRCYHCARGGG 
GAAGGATCCMDVHDRBMDKYMAYVYAHKKA 



FIG. 1. (A) Localization of conserved amino acid dom 



ninal coding region of reverse transcriptases of re 



;. The core homology regions VLPQG and YM/V D DI/V/LL were used to design the mixed oli°onucleotide prime. s t 
MOP-1 and MOP-2. (B) Primer set MOP-1 was optimized for amplification of type A, B, and D retroviruses whereas primer" set 
n.v P ™ v SC ud«c f ° r / a ;° red P, ' imin § of r «rovirus type C-related templates as well as human exogenous 'retroviruses such'as 
HIV, HTLV, HRV5, and foamy retroviruses. Standard single-letter abbreviations (1UPAC) are used to describe degenerate nu- 
cleotides. Both forward and reverse primers are oriented in the 5*-to-3* direction with respect to the DNA strand to be amplified 
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HTLV, HRV5, and foamy viruses (Fig. 1). For each primer set 
a separate PCR was performed. The amplification products 
were then mixed in equimoiar amounts and used as probe in 
the reverse dot-blot hybridization. 

To design the oligonucleotides bound on the reverse dot-blot 
filters, databases were screened for RT-related sequences. RT 
sequences of exogenous and endogenous retroviruses were clas- 
sified according to the current nomenclature and further sub- 
grouped with respect to their degree of nucleotide homology 
(data not shown). Representative members of all retrovirus fam- 
ilies published so far were selected (Table 1) and the sequence 
information corresponding to the 90-bp stretch between the 
highly conserved RT motifs LPQG and YM/VDDI/V/LL 16 
was used for synthesis of a pair of oligonucleotides, each 45 
nucleotides in length. Thus, each dot consists of an equimoiar 
mixture of two 45-mer oligonucleotides covering the internal 
sequence of the MOP amplicon. ' 

Reverse transcription and MOP PCR 

Five hundred nanograms of DNA-free mRNA preparations 
was reverse transcribed in a volume of 50 pi containing 20 mM 
Tris-HCI (pH 8.4), 10 mM dithiothreitol (DTT), 50 mM KC1, 
2.5 mM MgCl 2 deoxynucleoside triphosphates (dNTPs; 0.5 mM 
each), 10 units of RNasin (Promega, Madison, WI), 30 pmol 
of random hexamer oligonucleotides (Promega), and 20 units 
of murine leukemia virus (MuLV) reverse transcriptase 
(GIBCO-BRL, Gaithersburg, MD) at 37°C for 1 hr. Consecu- 
tively, reverse-transcribed samples were denatured for 5 min at 
95°C and stored at -20°C. 

MOP amplification was carried out in a total volume of 50 
p\ containing 1/20 of the cDNA reactions, 10 mM Tris-HCI 
(pH 8.3), 50 mM KC1, 2.5 mM MgCl 2 , 0.001% gelatin, 50 pmol 
of each mixed oligonucleotide primer set, a 0.25 mM concen- 
tration of each deoxynucleoside triphosphate, and 1.25 units of 
Taq polymerase (GIBCO-BRL). Amplification was performed 
in a DNA thermal cycler (Perkin-Elmer Cetus, Emeryville, CA). 
Cycle parameters were as follows: 30 cycles of 94'C for 30 sec, 
50°C for 4 min, and 72°C for 1 min, followed by a final ex- 
tension step of 7 min at 72°C. We have chosen 50°C for the an- 
nealing step since it corresponds to the annealing temperature 
of the degenerate primers with the highest A-T ratio. A con- 
trol reaction in which the template was omitted was carried out 
to detect product carryover and any traces of contaminating ge- 
nomic DNA in the solutions used. Amplification products were 
analyzed on preparative 2.5% Tris-borate-EDTA (TBE) 
agarose gels and stained with ethidium bromide. Bands of in- 
terest with sizes between 100 and 150 bp corresponding to am- 
plified retroviral reverse transcriptase sequences were excised 
from the gel and purified with a Gene Clean II kit (Bio 101, 
Vista CA). For reverse dot-blot hybridization about 50 ng of 
the purified fragments was labeled with [ot-- 12 P]dATP (3000 
Ci/mmol), using a Megaprime DNA labeling kit (Amersham 
Pharmacia Biotech, Little Chalfont, England). 

Preparation of filter arrays 

Retrovirus-specif ic synthetic oligonucleotides corresponding 
to the 90-bp internal part of the amplified RT sequence were 
synthesized and high-performance liquid chromatography 
(HPLC) purified by Birsner & Grob-Biotech GmbH (Freiburg, 



Germany). For each retroviral sequence 100 pmol of a pair of 
45-mer oligonucleotides mixed in equimoiar amounts was di- 
luted in 5X SSC(1 X SSC is 0.15 M NaCl plus 0.015 M sodium 
citrate) and spotted onto ZETAprobe GT blotting membranes 
(Bio-Rad, Hercules, CA), using a Minifold 1 dot blotter 
SRC96D (Schleicher & Schuell, Dassel, Germany). Filters were 
rinsed in 2X SSC and oligonucleotides were irreversibly im- 
mobilized by UV cross-linking (Stratalinker; Stratagene, La 
Jolla CA). Filters were allowed to air dry. 

Hybridizaton procedures 

Standardized hybridization conditions were as follows: Pre- 
tty bridization of reverse dot-blot filters was performed within 
sealed plastic bags in 0.25 M Na 2 HP0 4 (pH 7.2), 7% sodium 
dodecyl sulfate (SDS), 1 mM EDTA at 50°C for at least 3 hr. 
For hybridization 5 X 10 s cpm of labeled probe per milliliter 
of hybridization volume was added to the same solution and in- 
cubated for 16 hr under the same conditions. The membranes 
were then washed twice (30 min each) at 50°C in 40 mM 
Na 2 HP0 4 (pH 7.2), 5% SDS, 1 mM EDTA and twice in 40 mM 
Na 2 HP0 4 (pH 7.2), 1% SDS, 1 mM EDTA, respectively. Fil- 
ter membranes were exposed to X-ray film (BioMax; Eastman 
Kodak, Rochester, NY). 



RESULTS 

Design of mixed oligo primers 

The pol genes of all retroviruses and most retroelements 
share highly conserved core homology regions. 15-17 Two of the 
most conserved amino acid regions are the VLPQG and YV/M 
DDI/V/LL motifs (Fig. 1). The spacing between both motifs is 
about 90 base pairs and this region shows considerably less ho- 
mology when compared between different retrovirus families. 
According to a general principle outlined by Shih et al.' 5 we 
derived from these motifs universal PCR primers that allow am- 
plification of all known retroviral RT-related templates. After 
comparison of RT core homology regions of all human and 
mammalian endogenous and exogenous retroviral sequences 
available in the database, two sets of degenerate pol primers 
were designed. Primer set MOP-l was optimized for amplifi- 
cation of type A, B, and D retroviruses, whereas primer set 
MOP-2 was selected for favored priming of retrovirus type C- 
related templates as well as human exogenous retrovirus such 
as HIV, HTLV, HRV5 and foamy retroviruses. A 9-base ex- 
tension featuring a clamp and a Bainhll restriction site was in- 
corporated at the 5' end of each primer. Since the sequence ex- 
tension has a stabilizing effect on primer-template binding 
kinetics, the products generated after the first PCR cycle are 
amplified more efficiently in the remaining cycles. Therefore, 
the amplification reaction can be considered as "multiplex" 
PCR under moderate primer-template annealing conditions. 
Retroviral templates in the reaction mixture can be amplified 
sufficiently with MOP-l and MOP-2 primers even when the 
exactly matching primer is not available. Moreover, rapid prod- 
uct cloning for sequence verification or characterization of 
novel RT-related sequences is possible. PCR conditions were 
optimized with respect to the amount of primers, annealing 
time, and annealing temperature (data not shown). 



Table 1. Classification of Retrovirus-Specific Oligonucleotides and Dot Codes 



Retrovirus family Member Sequence Dot code 



Type B retroviruses HERV-K(HML-l) 


HML-1 (U35102) 


1A 




Seq29 (S77579) 


IB 


HERV-K(HML-2) 


HERV-K10 (M14123) 


2A 




HERV clone M3.5 (U87592) 


2B 


HERV-K(HML-3) 


HML-3 (U35236) 


3A 




HERV 1 (S66676) 


3B 




RT244 (S77583) 


3C 




Seq26 a 


3D 




Seq34 a 


4A 




Seq42 a 


4B 




Seq43 a 


4C 


HERV-K(HML-4) 


HERV-K-T47D (AF020092) 


5A 




Seq05 a 


5B 




Seql0 a 


5C 


HERV-K(HML-5) 


HML-5 (U35161) 


6A . 


HERV-K(HML-6) 


HML-6 (U60269) 


7A 




Seq38 a 


7B 




Seq56 a 


7C 


HERV-K(C4) 


HERV-K-C4 (U07856) 


8A 




Seq31 a 


8B 


Unassigned 


SeqU39937 (U39937) 


5F 


Type C retroviruses HERV-H 


SeqG46.2 (AF026252) 


2J, 2K b 




Seq61 a 


2K, 5J b 




Seq66 a 


2L, 5K b 


ERV9/HERV-W 


ERV9 (X57147) 


4E 




Seq49 a 


4F 




Seq59 a 


4G 




Seq60 a 


411 




Seq63 a 


41 




Seq64 a 


4J 




HERV-W (AF009668) 


4L 


ERV-FRD 


ERV-FRD (U27240) 


3E 




Seq46 a 


3F 


HERV-ERI 


HERV-E(4-1) (M10976) 






Seq32 a 


21 


HERV-IP 


HERV-I (M92067) 


3H 




HERV-1P-T47D (U27241) 


31 




Seq65 a 


3J 


HERV-T 


S71 pCRTKl (U 12970) 


2E 


Type D retroviruses MPMV 


S71 pCRTK6 (U 12969) 


2F 


Seq36 a 


5H 


Foamy virus related 


HERV-L (G895836) 


IE 




Seq39 a 


IF 




Seq40 a 


1G 




Seq45 a 


1H 




Seq48 a 


11 




Seq51 a 


1J 


Unassigned human 


Seq58 a 


IK 


Seq35 c 


5G 


retroviral elements 


Seq41 a 


51 


Human nonviral retroposon 


Seq77 a 


5J 


LINE-1 (M80343) 


3L 


Human exogenous 


HRV5 (U46939) 


6E 


retroviruses 


Foamy virus (Y07725) 


6F 




HTLV-1 C 


6G 




HTLV-I1 (M 10060) 


6H 




HIV-1 C 


61 




HIV-2 (J04542) 




Mammalian endogenous 


MMTV (M 15 122) 


7E 


retroviruses 


PERV (AF038600) 


7F 




BaEV (D 10032) 


7G 




GaLV (M26927) 


7H 




Mo-MuLV (J02255) 


71 




MPMV (ML2349) 


7J 



"From Ref. 21. 

"Filter code corresponding to Fig. 4 only. 
c From Ref. 22. 
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sample 

mRNA % 




cell specific 
HERV transcription 
pattern 



Spotting retrovirus-specific synthetic 
oligonucleotides to filter membranes 



FIG. 2. Flow chart of RT-PCR/RDBH procedure described in Materials and Methods. In general, samples from any biological 
source, e.g., all type of body fluids, tissues, cells, and cell culture supernatants, can be tested for the presence of retroviral nu- 
cleic acids. 
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FIG. 3. HERV expression pattern in human PBMCs of a healthy blood donor. Reverse dot blots were probed under standard- 
ized conditions (as described in Materials and Methods) and DNA fragments amplified from PBMC-derived cDNA with primer 
set MOP-1 (A), MOP-2 (B), and with an equimolar amount of amplification products obtained with MOP-1 and MOP-2 primers, 
respectively (C). For fast assignment of retrovirus-specific oligonucleotides compare with (D); for one ; ld 
tion see Table 1. 
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Reverse dot-blot hybridization 

In the second step, identification of the amplified products 
was performed by reverse dot-blot hybridization (RDBH) 
analysis (Fig. 2). This method is employed to sort unerringly 
all products of the PCR amplification. In contrast to the relaxed 
primer-template binding allowed during PCR amplification, 
RDBH enables the strict discrimination of PCR products and 
makes preceding false amplification of non-RT-related se- 
quences irrelevant. The high stringency of RDBH was achieved 
by the use of synthetic HERV-specific oligonucleotides spot- 
ted onto the filter membranes (Table 1). These oligonucleotides 
correspond to the pol sequences amplified with MOP-1 and 
MOP-2 primer sets, except that they lack the primer sequences 
themselves. Therefore, specificity of hybridization is due solely 
to the amplified sequence found between the described RT core 
homology motifs. Thus, under high-stringency conditions the 
exact identification of even closely related retroviral sequences 
is possible. 

In this study we selected 61 retrovirus-specifi c oligonu- 
cleotides for RDBH analysis corresponding to pol genes of rep- 
resentative members of all known human exogenous and en- 
dogenous retrovirus families. In addition, six mammalian 
retroviruses were included. Origin and taxonomic classification 
of all sequences used are summarized in Table 1. 

HERV transcription pattern in human peripheral 
blood mononuclear cells 

To assess the feasibility and specificity of the PCR/RDBH 
assay system, HERV transcription was analyzed in human 
PBMCs (Fig. 3). When the MOP-1 primer set alone was used 
for amplification, almost exclusively type B-related HERVs 
were detected, particularly members of the HERV-K subgroups 
HML-2, -3, -4, -6, and -C4 (Fig. 3A, dots 2A. and 2B, dots 
3A-3D and 4A-4C, dots 5A-5C, dots 7A and 7B, and dots 8A 
and 8B) and a not yet classified HERV-K-related sequence (dot 
5F). This expression pattern concurs with previously published 
studies demonstrating a differential expression of HERV-K el- 
ements in human tissues. 18 - 19 No crosshybridization was ob- 
served with type C-related HERVs. Low amounts of products 
were obtained for one of the human foamy virus-related HERV- 
L elements (Fig. 3A, dot IE). 

With MOP-2 primers HERV-E-related elements (Fig. 3B; 
dots 2H and 21), sequences of the HERV-L family (dots 
1E-1K), and ERV9-related HERVs (dots 4E-4G, and 41) were 
preferentially amplified. A certain amount of HERV-K(HML- 
4)- and HERV-K(HML-6)-related sequences was also present 
in the hybridization probe. Although the same amount of ra- 
dioactively labeled probes has been used in all hybridization re- 
actions, genomic control DNA (dots 8E-8H) gives much 
stronger signals with the MOP-2-amplified probe than with the 
MOP-1 probe, indicating that the human genome contains sig- 
nificantly more copies of type C-related than type B-related 
HERV elements. 

For detection of all retroviral sequences in a single experi- 
ment MOP-1 and MOP-2 primers were first added in an 
equimolar ratio to the PCR. However, this experiment resulted 
in a predominant amplification of type C-related sequences, the 
ABD-type sequences being underrepresented (data not shown). 
Therefore, we performed separate PCR with either MOP-1 or 



MOP-2 primer sets, and mixed the purified amplification prod- 
ucts of both reactions in equimolar amounts. This procedure re- 
sulted in a signal pattern that would have been expected when 
combining both primer sets (Fig. 3C) and corresponds roughly 
to the amount of type B- and type C-related HERV transcripts 
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FIG. 4. HERV expression pattern in T47D cells after steroid 
treatment (A) and HERV transcripts packaged in T47D parti- 
cles (B). DNA probes for reverse dot-blot hybridization were 
generated by reverse transcription of mRNA isolated from 
steroid-treated T47D cells (A) and from T47D particles (B), re- 
spectively. 3 For fast assignment of retrovirus-specific oligonu- 
cleotides compare with (C); for origin and exact identification 
see Table 1. 
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primer design two of the most highly conserved motifs within 
the reverse transcriptase-enco ding region of the pol gene were 
exploited. The usefulness of these conserved motifs for detec- 
tion of novel retroviruses has been demonstrated in pioneering 
work by several authors. I5J6 ' 2 L22 In the first step of our ex- 
perimental approach, which can be considered as "multiplex" 
PGR under moderate primer-template annealing conditions, all 
retroviral templates of the sample investigated are amplified. In 
the second step, exact sorting of the amplified products is per- 
formed by RDBH under high-stringency conditions. With this 
highly sensitive and species-specific diagnostic tool we could 
detect as few as 10 PERV DNA copies contained in cDNA de- 
rived from 25 ng of human PBMC mRNA. 

However, it is important to emphasize that the PCR/RDBH 
assay is primarily a qualitative detection technique. Although 
the distinct intensity of the autoradiograph signals may give a 
strong impression of a quantitative monitoring of retrovirus ex- 
pression, it is worth noting that several parameters of uncer- 
tainty may lead to a signal pattern that differs from the true ex- 
pression rates in the sample. The use of highly degenerate 
primer sets combined with relaxed primer-template binding 
conditions may lead to preferential amplification of certain 
"high copy" or a few "best fit" templates, whereas others stay 
underrepresented. This effect increases with the number of PCR 
cycles performed and becomes critical above 35 cycles. Thus, 
no more than 30 rounds of PCR should be performed. The mul- 
tiplex PCR does not allow an internal standardization except 
for overall hybridization efficacy and autoradiograph exposure 
time. With PCR/RDBH identified RT related transcripts must 
be quantified in further experiments by conventional methods 
such as Northern blotting or by a specific competitive PCR es- 
tablished for the retroviral sequence of interest. 

On the other hand, it is an advantageous feature of the 
PCR/RDBH assay that, because of the highly degenerate 
primers and cross-hybridization by lowering the stringency of 
hybridization conditions, it allows isolation and characteriza- 
tion of yet unknown retroviral sequences. DNA hybridizing to 
the covalently bound oligonucleotides can be eluted from the 
filter membrane by alkaline denaturation and reamplified to pro- 
vide sufficient double-stranded DNA for cloning and subse- 
quent sequence analysis. 

With the employment of nonradioactive labeling techniques 
the PCR/RDBH assay offers the possibility of an automatable 
procedure lor rapid analysis of retroviral expression. DNA chip 
technology may be applied, facilitating handling and increas- 
ing efficacy of reverse dot-blot filter membranes. Computer-as- 
sisted evaluation of RDBH results by phospho/fluorescence- 
im aging systems may further improve visualization. It is one 
of the advantageous features of the PCR/RDBH assay system 
that the test is unlimited with respect to number and origin of 
retroviral RT sequences to be tested. Novel RT-encoding se- 
quences can be easily added to the filter arrays. Modifications 
in the experimental design are not necessary. Moreover, by 
modifying the hybridization conditions PCR/RDBH can be used 
to search for new exo- or endogenous retroviruses with only 
weak homologies to already known families. 

In summary, the PCR/RDBH assay is a powerful technique for 
precise qualitative analysis of retrovirus activity in biological sam- 
ples. Number and types of retroviral sequences to be identified 
are determined solely by number and types of synthetic oligonu- 



cleotides spotted onto reverse dot-blot membranes. PCR/RDBH 
could be useful in guarding patients against undesired transmis- 
sion of genetic material by retroviruses from therapeutic protein 
preparations, in gene therapy and xenotransplantation. 
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